Bayes' theorem
Encyclopedia : B : BA : BAY : Bayes' theorem
Bayes's theorem (also known as Bayes's rule) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. In some interpretations of probability, Bayes's theorem tells how to update or revise beliefs in light of new evidence: a posteriori.
The probability of an event A conditional on another event B is generally different from the probability of B conditional on A. However, there is a definite relationship between the two, and Bayes's theorem is the statement of that relationship.
As a formal theorem, Bayes's theorem is valid in all interpretations of probability. However, frequentist and Bayesian interpretations disagree about the kinds of things to which probabilities should be assigned in applications: frequentists assigned probabilities to random events according to their frequencies of occurrence or to subsets of populations as proportions of the whole; Bayesians assign probabilities to propositions that are uncertain. A consequence is that Bayesians have more frequent occasion to use Bayes's theorem. The articles on Bayesian probability and frequentist probability discuss these debates at greater length.
- 1 Statement of Bayes's theorem
- 2 Derivation from conditional probabilities
- 3 Alternative forms of Bayes's theorem
- 3.1 Bayes's theorem in terms of odds and likelihood ratio
- 3.2 Bayes' theorem for probability densities
- 3.3 Extensions of Bayes's theorem
- 4 Examples
- 4.1 Example #1: False positives in a medical test
- 4.2 Example #2: Conditional probabilities
- 4.3 Example #3: Bayesian inference
- 5 Historical remarks
- 6 See also
- 7 References
Statement of Bayes's theorem
Bayes's theorem relates the conditional and marginal probabilities of stochastic events A and B:
- [\Pr(A|B) = \frac \propto L(A | B)\, \Pr(A) \!]
Each term in Bayes's theorem has a conventional name:
- Pr(A) is the prior probability or marginal probability of A. It is "prior" in the sense that it does not take into account any information about B.
- Pr(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.
- Pr(B|A) is the conditional probability of B given A.
- Pr(B) is the prior or marginal probability of B, and acts as a normalizing constant.
- [ \mbox = \frac \times \mbox} } ]
In addition, the ratio Pr(B|A)/Pr(B) is sometimes called the standardised likelihood, so the theorem may also be paraphrased as
- [ \mbox = \times \mbox }.\, ]
Derivation from conditional probabilities
To derive the theorem, we start from the definition of conditional probability. The probability of event A given event B is
- [\Pr(A|B)=\frac.]
- [\Pr(B|A) = \frac. \!]
- [\Pr(A|B)\, \Pr(B) = \Pr(A \cap B) = \Pr(B|A)\, \Pr(A). \!]
- [\Pr(A|B) = \frac. \!]
Alternative forms of Bayes's theorem
Bayes's theorem is often embellished by noting that
- [\Pr(B) = \Pr(A\cap B) + \Pr(A^C\cap B) = \Pr(B|A) \Pr(A) + \Pr(B|A^C) \Pr(A^C)\,]
- [\Pr(A|B) = \frac. , \!]
- [\Pr(A_i|B) = \frac , \!]
Bayes's theorem in terms of odds and likelihood ratio
Bayes's theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as
- [O(A|B)=O(A) \cdot \Lambda (A|B) ]
- [O(A|B)=\frac \!]
- [O(A)=\frac \!]
- [\Lambda (A|B) = \frac = \frac \!]
See also the law of total probability.
Bayes' theorem for probability densities
There is also a version of Bayes's theorem for continuous distributions. It is somewhat harder to derive, since probability densities, strictly speaking, are not probabilities, so Bayes's theorem has to be established by a limit process; see Papoulis (citation below), Section 7.3 for an elementary derivation. Bayes's theorem for probability densities is formally similar to the theorem for probabilities:
- [ f(x|y) = \frac = \frac \!]
- [ f(x|y) = \frac^ f(y|x)\,f(x)\,dx}.\!]
Here we have indulged in a conventional abuse of notation, using f for each one of these terms, although each one is really a different function; the functions are distinguished by the names of their arguments.
Extensions of Bayes's theorem
Theorems analogous to Bayes's theorem hold in problems with more than two variables. For example:
- [ \Pr(A|B,C) = \frac \!]
- [ \Pr(A|B,C) = \frac = \frac = ]
- [ = \frac = \frac .]
Examples
Example #1: False positives in a medical test
Suppose that a test for a particular disease has a very high success rate:
- if a tested patient has the disease, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
- if a tested patient does not have the disease, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).
Let D be the event that the patient has the disease, and T be the event that the test returns a positive result. Then, using the second alternative form of Bayes's theorem (above), the probability of a true positive is
- [ P(T) = P(T|D)\,P(D) + P(T|D^C)\,P(D^C)\!]
- [ P(D|T) = \frac\!]
- [ P(D|T) = \frac = 11/566 \approx 0.019,\!]
Despite the apparent high accuracy of the test, the incidence of the disease is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the disease. It should be noted that this is quite common in screening tests. It is more important to have a very low false negative rate than a high true positive rate.
Example #2: Conditional probabilities
Suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?
Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes's theorem. But first, we can clarify the situation by rephrasing the question to "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” Thus, to relate to our previous explanation, the event A is that Fred picked bowl #1, and the event B is that Fred picked a plain cookie. To compute Pr(A|B), we first need to know:
- Pr(A), or the probability that Fred picked bowl #1 regardless of any other information. Since Fred is treating both bowls equally, it is 0.5.
- Pr(B), or the probability of getting a plain cookie regardless of any information on the bowls. In other words, this is the probability of getting a plain cookie from each of the bowls. It is computed as the sum of the probability of getting a plain cookie from a bowl multiplied by the probability of selecting this bowl. We know from the problem statement that the probability of getting a plain cookie from bowl #1 is 0.75, and the probability of getting one from bowl #2 is 0.5, and since Fred is treating both bowls equally the probability of selecting any one of them is 0.5. Thus, the probability of getting a plain cookie overall is 0.75×0.5 + 0.5×0.5 = 0.625.
- Pr(B|A), or the probability of getting a plain cookie given that Fred has selected bowl #1. From the problem statement, we know this is 0.75, since 30 out of 40 cookies in bowl #1 are plain.
- [\Pr(A|B) = \frac = \frac = 0.6]
Tables of occurrences and relative frequencies
It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies.
| Number of cookies in each bowl by type of cookie | Relative frequency of cookies in each bowl by type of cookie | |||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies.
Example #3: Bayesian inference
Applications of Bayes's theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities. One such example follows. For additional worked out examples, including simpler examples, please see the article on the examples of Bayesian inference.We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the prior. The conditional distribution of A given the "data" B is the posterior probability distribution or just the posterior.
Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes's theorem we can calculate the probability distribution function for r using
- [ f(r | n=10, m=7) = \frac . \!]
The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval [0, 1]. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely.
Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution.
- [ \Pr( m=7 | r, n=10) = \, r^7 \, (1-r)^3. ]
- [ \int_0^1 \Pr( m=7|r, n=10) \, f(r) \, dr = \int_0^1 \, r^7 \, (1-r)^3 \, 1 \, dr = \, \frac \!]
- [ f(r | n=10, m=7) = \frac = 1320 \, r^7 \, (1-r)^3 ]
One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i.e., the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is
- [1320\int_^1 r^7(1-r)^3\,dr \approx 0.887, \!]
Historical remarks
Bayes's theorem is named after the Reverend Thomas Bayes (1702–1761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work.
One of Bayes's results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur:
- If there be two subsequent events, the probability of the second b/N and the probability of both together P/N, and it being first discovered that the second event has also happened, the probability I am right [i.e., the conditional probability of the first event being true given that the second has also happened] is P/b.
Bayes's main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is
- [\frac p^m (1-p)^n\,dp} p^m (1-p)^n\,dp}\!]
What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind.
Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball.
See also
- Bayesian inference
- Monty Hall problem
- Occam's razor
- Prosecutor's fallacy
- Raven paradox
- Revising opinions in statistics
- Empirical Bayes method
- Bayesian spam filtering
References
Versions of the essay
- Thomas Bayes (1763), "An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S.", Philosophical Transactions, Giving Some Account of the Present Undertakings, Studies and Labours of the Ingenious in Many Considerable Parts of the World 53:370–418.
- Thomas Bayes (1763/1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes's Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:296–315. (Bayes's essay in modernized notation)
- Thomas Bayes ["An essay towards solving a Problem in the Doctrine of Chances"]. (Bayes's essay in the original notation)
Commentaries
- G. A. Barnard (1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes's Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:293–295. (biographical remarks)
- Daniel Covarrubias. ["An Essay Towards Solving a Problem in the Doctrine of Chances"]. (an outline and exposition of Bayes's essay)
- Stephen M. Stigler (1982). "Thomas Bayes's Bayesian Inference," Journal of the Royal Statistical Society, Series A, 145:250–258. (Stigler argues for a revised interpretation of the essay; recommended)
- Isaac Todhunter (1865). A History of the Mathematical Theory of Probability from the time of Pascal to that of Laplace, Macmillan. Reprinted 1949, 1956 by Chelsea and 2001 by Thoemmes.
Additional material
- Pierre-Simon Laplace (1774). "Mémoire sur la Probabilité des Causes par les Événements", Savants Étranges 6:621–656; also Œuvres 8:27–65.
- Pierre-Simon Laplace (1774/1986). "Memoir on the Probability of the Causes of Events", Statistical Science 1(3):364–378.
- Stephen M. Stigler (1986). "Laplace's 1774 memoir on inverse probability", Statistical Science 1(3):359–378.
- Stephen M. Stigler (1983). "Who Discovered Bayes's Theorem?" The American Statistician 37(4):290–296.
- Jeff Miller et al. [Earliest Known Uses of Some of the Words of Mathematics (B)]. (very informative; recommended)
- Athanasios Papoulis (1984). Probability, Random Variables, and Stochastic Processes, second edition. New York: McGraw-Hill.
- James Joyce (2003). ["Bayes's Theorem"], Stanford Encyclopedia of Philosophy.
- [The on-line textbook: Information Theory, Inference, and Learning Algorithms], by David J.C. MacKay provides an up to date overview of the use of Bayes's theorem in information theory and machine learning.
- [Stanford Encyclopedia of Philosophy: Bayes's Theorem] provides a comprehensive introduction to Bayes's theorem.
- , [Bayes' Theorem] at MathWorld.
- This article incorporates material from on PlanetMath, which is licensed under the [Text of the GNU Free Documentation LicenseGFDL].
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
