Introduction to the Bayes Theorem

General references:

  • Statistical Inference (9780534243128): Casella, George, Berger, Roger L.

  • Probability Theory and Statistical Inference: Empirical Modeling with Observational Data (9781107185142): Spanos, A.

  • Bayesian Models: A Statistical Primer for Ecologists (9780691159287): Hobbs, N. Thompson, Hooten, Mevin B.

  • A First Course in Bayesian Statistical Methods (0387922997): Hoff, Peter D.




Bayes’ Rule

Starting from the rules of probability giving in the first section, we see that we can go from joint probabilities to conditional via:

\[P(A,B) = P(A|B) P(B)\]

There is no reason we can’t do the same but reversing the conditioning:

\[P(A,B) = P(B|A) P(A)\]

Setting these two equal and solving for one of the two conditional probabilities, we get:

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

This is known as Bayes’ Rule.

Note that any combination of discrete and continous random variables can also be studied, for instance, assuming X is a discrete randcom variable and Y is a continuous random variable, and being verbose using probabilities for X and densities for Y, we would write this as:

\[P(X=x|Y=y)=\frac{f_{Y|X=x}(y)P(X=x)}{f_Y(y)}\]



Bayes’ rule for updating probabilities

Suppose we are interested in taking a test to determine if we have a disease, COVID-19, for example. As we take the test, we are told that in our area, 5% of the population has the disease.

What is our probability of having the disease?? Without other information, we are just a random draw from the population, so 5% is a good guess.

What information do we know? $\(P(disease) = 0.05; P(\overline{disease}) = 0.95\)$

Note: \(\overline{disease}\) means, compliment or NOT disease.

We then receive the results, the test comes back positive. Are we in fact infected? This question turns out to be less straightforward. The reason this takes a bit more thought is the testing has error that needs to be accounted for. The way to approach this question is through Bayes’ Rule.

What we want is the probability we have the disease given a positive test result:

\[P(disease|+ test)\]

Bayes’ Rule then gives this as:

\[P(disease|+ test) = \frac{P(+ test|disease)P(disease)}{P(+ test)}\]

What are \(P(+ test|disease)\) and \(P(+ test)\)? The first, \(P(+ test|disease)\) is the sensitivity of the test, or true positive rate. For one of the COVID-19 tests, the test sensitivity is reported to be 80%. What about \(P(+ test)\)? Using the law of total probability, we are able to partition that into something more recognizable:

\[P(+test) = P(+test|disease)P(disease) + P(+test|\overline{disease})P(\overline{disease})\]
\[P(disease|+ test) = \frac{P(+ test|disease)P(disease)}{P(+test|disease)P(disease) + P(+test|\overline{disease})P(\overline{disease})}\]

Partitioning like this, we see we need one more item of information: \(P(+test|\overline{disease})\) which is the false positive rate. Tests usually report thier specificity, which is the true negative rate (\(P(-test|\overline{disease})\) = ) rather than the FPR, so we need to convert:

\[P(+test|\overline{disease}) = 1 - P(-test|\overline{disease}) = 1 - 0.989 = 0.011\]

We can now answer our question:

\[P(disease|+ test) = \frac{0.8 \ast 0.05}{0.8\ast 0.05 + 0.011 \ast 0.95} = 0.79\]

Not 100%, but still high at 79%.




Breaking down Bayes’ Rule

Let’s go through the parts of Bayes’ Rule. We will go through this more completely in another section.

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Posterior

The \(P(A|B)\) on the left hand side is termed the posterior. It is the probability or likelihood of A, given B is true. As we will see, in Bayesian analysis, we are often looking for the posterior to represent the distribution of the parameter given data.

Likelihood

The term, \(P(B|A)\), is the likelihood. In our disease example above, this was the probability of a positive test, given disease. In future problems, we will be more interested in inference of parameters given data, such that the likelihood will represent the likelihood of observing the data given parameters.

Priors

\(P(A)\) is called the prior. In the above example, this reflected our prior knowledge. Before collecting (new) data, we had some notion of our probability of having disease. It can represent a belief, it can be informed, or vague.

Above, we used a probability, the probability of disease based on the population was set to 5%. We could have changed the prior to use a distribution, perhaps we had data of surrounding counties and knew the average was 5% with a defined variance. Priors can also be used to induce a known distribution in the posterior.

Marginal

The denominator, \(P(B)\), is the marginal probability of B. This is a constant and in many analysis may be dropped.

Using these terms, Bayes’ Rule can be written:

\[posterior = \frac{likelihood \ast prior}{marginal}\]



GRADED EVALUATION (15 mins)

  1. A car repair shop receives a car with reports of strange noises coming from the engine. The shop knows 90% of the cars that come in for “noises” have a loose fan belt while the other 10% have a loose muffler. A common description, 95%, of cars having loose mufflers is they rattle. Less commonly, 8%, fan belt issues can also sound like a rattle. The car owner is describing the strange noise as a rattle. What is the probability the car has a loose muffler?

    a. 78%

    b. 57%

    c. 95%

  2. It is estimated that 80% of emails are spam. You have developed a new algorithm to detect spam. Your spam software can detect 99% of spam emails but has a false positive rate of 5%. Your company receives 1000 emails in a day, how many emails will be incorrectly marked as spam?

    a. 10

    b. 20

    c. 5

    d. 200

    e. 50

  3. You have developed a new algorithm for detecting fraud. It has a sensitivity of 90% with a specificity of 95%. Choose the correct statement:

    a. true positive rate = 90%, true negative rate = 5%

    b. true positive rate = 90%, true negative rate = 95%