Bayesian probability
Encyclopedia : B : BA : BAY : Bayesian probability
Bayesian theory is based on the tenet that the concept of probability can be defined as degree to which a person believes a proposition. Bayesian theory thus provides one interpretation of probability, called Bayesian probability. See probability interpretation and the philosophy of mathematics.
Bayesian theory also holds that Bayes' theorem can be used as a rule to infer or update the degree of belief in light of new information. See Bayesian inference.
History of Bayesian probability
Bayesian theory and Bayesian probability are named after Thomas Bayes (1702 — 1761), who proved a special case of what is now called Bayes' theorem. The term Bayesian, however, came into use only around 1950, and it is not clear that Bayes would have endorsed the very broad interpretation of probability that is associated with his name. Laplace proved a more general version of Bayes' theorem and used it to solve problems in celestial mechanics, medical statistics and, by some accounts, even jurisprudence. Laplace, however, didn't consider this general theorem to be important for probability theory. He instead adhered to the classical interpretation of probability.
Frank P. Ramsey in The Foundations of Mathematics (1931) first proposed using subjective belief as way of interpreting probability. Ramsey saw this interpretation as a complement to the frequency interpretation of probability, which was more established and accepted at the time. The statistician Bruno de Finetti in 1937 adopted Ramsey's view as an alternative to the frequency interpretation of probability. L. J. Savage expanded the idea in The Foundations of Statistics (1954).
Formal attempts have been made to define and apply the intuitive notion of a "degree of belief". The most common application is based on betting: a degree of belief is reflected in the odds and stakes that the subject is willing to bet on the proposition at hand.
When beliefs have degrees, theorems of probability calculus measure the rationality of beliefs in the same way that the theorems of first order logic measure the rationality of beliefs. Many regard degrees of belief as extensions of classical truth values (true and false).
The Bayesian approach has been explored by Harold Jeffreys, Richard T. Cox, Edwin Jaynes and I. J. Good. Other well-known proponents of Bayesian probability have included John Maynard Keynes and B.O. Koopman.
Varieties of Bayesian probability
The terms subjective probability, personal probability, epistemic probability and logical probability describe some of the schools of thought which are customarily called "Bayesian". These overlap but there are differences of emphasis. Some of the people mentioned here would not call themselves Bayesians.Bayesian probability is supposed to measure the degree of belief an individual has in an uncertain proposition, and is in that respect subjective. Some people who call themselves Bayesians do not accept this subjectivity. The chief exponents of this objectivist school were Edwin Thompson Jaynes and Harold Jeffreys. Perhaps the main objectivist Bayesian now living is James Berger of Duke University. Jose Bernardo and others accept some degree of subjectivity but believe a need exists for "reference priors" in many practical situations.
Advocates of logical (or objective epistemic) probability, such as Harold Jeffreys, Rudolf Carnap, Richard Threlkeld Cox and Edwin Jaynes, hope to codify techniques whereby any two persons having the same information relevant to the truth of an uncertain proposition would calculate the same probability. Such probabilities are not relative to the person but to the epistemic situation, and thus lie somewhere between subjective and objective. However, the methods proposed are controversial. Critics challenge the claim that there are grounds for preferring one degree of belief over another in the absence of information about the facts to which those beliefs refer. Another problem is that the techniques developed so far are inadequate for dealing with realistic cases.
Bayesian probability and frequency probability
Bayesian probability contrasts with frequency probability, in which probability is derived from observed frequencies in defined distributions or proportions in populations.Differences in the two interpretations imply different methods in statistics. For example, Laplace estimated the mass of Saturn using Bayesian methods. However, probability theory using frequency probability cannot be applied to this problem since the mass of Saturn has a determinate, but unknown value. Its value cannot be represented as a random value from a distribution or population.
Similarly, when comparing two hypotheses using the same information, frequency probability theory would state the rejection or non-rejection of the original hypothesis with a particular degree of confidence, while Bayesian methods would state that one hypothesis was more probable than another or that the expected loss associated with one hypothesis was less than the expected loss of another.
The theory of statistics and probability using frequency probability was developed by R.A. Fisher, Egon Pearson and Jerzy Neyman during the first half of the 20th century. A. N. Kolmogorov also used frequency probability to lay the mathematical foundation of probability in measure theory via the Lebesgue integral in Foundations of the Theory of Probability (1933).
Savage, Koopman, Abraham Wald and others have developed Bayesian probability since 1950.
Applications of Bayesian probability
Since the 1950s, Bayesian theory and Bayesian probability have been widely applied through Cox's theorem, Jaynes' principle of maximum entropy and the Dutch book argument. In many applications, Bayesian methods are more general and appear to give better results than frequency probability. Bayes factors have also been applied with Occam's Razor. See Bayesian inference and Bayes' theorem for mathematical applications.Some regard Bayesian inference as an application of the scientific method because updating probabilities through Bayesian inference requires one to start with initial beliefs about different hypotheses, to collect new information (for example, by conducting an experiment), and then to adjust the original beliefs in the light of the new information. Adjusting original beliefs could mean (coming closer to) accepting or rejecting the original hypotheses.
Bayesian techniques have recently been applied to filter spam e-mail. A Bayesian spam filter uses a reference set of e-mails to define what is originally believed to be spam. After the reference has been defined, the filter then uses the characteristics in the reference to define new messages as either spam or legitimate e-mail. New e-mail messages act as new information, and if mistakes in the definitions of spam and legitimate e-mail are identified by the user, this new information updates the information in the original reference set of e-mails with the hope that future definitions are more accurate. See Bayesian inference and Bayesian filtering.
Probabilities of probabilities
One criticism levelled at the Bayesian probability interpretation has been that a single probability assignment cannot convey how well grounded the belief is—i.e., how much evidence one has. Consider the following situations:- You have a box with white and black balls, but no knowledge as to the quantities
- You have a box from which you have drawn n balls, half black and the rest white
- You have a box and you know that there are the same number of white and black balls
- 1. You have a box with white and black balls, but no knowledge as to the quantities
- :Letting [\theta = p] represent the statement that the probability of the next ball being black is [p], a Bayesian might assign a uniform Beta prior distribution:
- :[\forall \theta \in [0,1]]
- :[P(\theta) = \Beta(\alpha_B=1,\alpha_W=1) = \frac\theta^(1-\theta)^ = \frac\theta^0(1-\theta)^0=1]
- :Assuming that the ball drawing is modelled as a binomial sampling distribution, the posterior distribution, [P(\theta|m,n)], after drawing m additional black balls and n white balls is still a Beta distribution, with parameters [\alpha_B=1+m], [\alpha_W=1+n]. An intuitive interpretation of the parameters of a Beta distribution is that of imagined counts for the two events. For more information, see Beta distribution.
- 2. You have a box from which you have drawn N balls, half black and the rest white
- :Letting [\theta = p] represent the statement that the probability of the next ball being black is [p], a Bayesian might assign a Beta prior distribution, [\Beta(N/2+1,N/2+1)]. The maximum aposteriori (MAP) estimate of [\theta] is [\theta_=\frac], precisely Laplace's rule of succession.
- 3. You have a box and you know that there are the same number of white and black balls
- :In this case a Bayesian would define the prior probability [P\left(\theta\right)=\delta\left(\theta - \frac\right)].
Because there is no room for metaprobabilities on the frequency interpretation, frequentists have had to find different ways of representing difference of evidential support. Cedric Smith and Arthur Dempster each developed a theory of upper and lower probabilities. Glenn Shafer developed Dempster's theory further, and it is now known as Dempster-Shafer theory.
Controversy
A quite different interpretation of the term probable has been developed by frequentists. In this interpretation, what are probable are not propositions entertained by believers, but events considered as members of collectives to which the tools of statistical analysis can be applied.The Bayesian interpretation of probability allows probabilities to be assigned to all propositions (or, in some formulations, to the events signified by those propositions) independently of any reference class within which purported facts can be thought to have a relative frequency. Although Bayesian probability is not relative to a reference class, it is relative to the subject: it is not inconsistent for different persons to assign different Bayesian probabilities to the same proposition. For this reason Bayesian probabilities are sometimes called personal probabilities (although there are theories of personal probability which lack some features that have come to be identified with Bayesianism).
Although there is no reason why different interpretations (senses) of a word cannot be used in different contexts, there is a history of antagonism between Bayesians and frequentists, with the latter often rejecting the Bayesian interpretation as ill-grounded. The groups have also disagreed about which of the two senses reflects what is commonly meant by the term 'probable'.
To illustrate, whereas both a frequency probability and a Bayesian probability (of, e.g., 0.5) could be assigned to the proposition that the next tossed coin will land heads, only a Bayesian probability could be assigned to the proposition, entertained by a particular person, that there was life on Mars a billion years ago—because this assertion is made without reference to any population relative to which the relative frequency could be defined.
See also
- probability interpretations
- Frequency probability
- Uncertainty
- Inference
- Bayesian inference
- Doomsday argument for a controversial use of Bayesian inference
- MaxEnt thermodynamics - Bayesian view of thermodynamics
External links and references
- [On-line textbook: Information Theory, Inference, and Learning Algorithms], by David MacKay, has many chapters on Bayesian methods, including introductory examples; arguments in favour of Bayesian methods (in the style of Edwin Jaynes); state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
- Jaynes, E.T. (1998) [Probability Theory : The Logic of Science].
- Bretthorst, G. Larry, 1988, [Bayesian Spectrum Analysis and Parameter Estimation] in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
- http://www-groups.dcs.st-andrews.ac.uk/history/Mathematicians/Ramsey.html
- David Howie: Interpreting Probability, Controversies and Developments in the Early Twentieth Century, Cambridge University Press, 2002, ISBN 0521812518
- Colin Howson and Peter Urbach: Scientific Reasoning: The Bayesian Approach, Open Court Publishing, 2nd edition, 1993, ISBN 0812692357, focuses on the philosophical underpinnings of Bayesian and frequentist statistics. Argues for the subjective interpretation of probability.
- Luc Bovens and Stephan Hartmann: Bayesian Epistemology. Oxford: Oxford University Press 2003. Extends the Bayesian program to more complex decision scenarios (e.g. dependent and partially reliable witnesses and measurement instruments) using Bayesian Network models. The book also proofs an impossibility theorem for coherence orderings over information sets and offers a measure that induces a partial coherence ordering.
- Jeff Miller ["Earliest Known Uses of Some of the Words of Mathematics (B)"]
- James Franklin [The Science of Conjecture: Evidence and Probability Before Pascal], history from a Bayesian point of view.
- Paul Graham ["Bayesian spam filtering"]
- novomind AG ["Outlook categorizing tool based on Bayesian filtering"]
- Howard Raiffa Decision Analysis: Introductory Lectures on Choices under Uncertainty. McGraw Hill, College Custom Series. (1997) ISBN 007-052579-X
- Devender Sivia, Data Analysis: A Bayesian Tutorial. Oxford: Clarendon Press (1996), pp. 7-8. ISBN 0-19-851889-7
- Henk Tijms: Understanding Probability, Cambridge University Press, 2004
- Is the portrait of Thomas Bayes authentic? [Who Is this gentleman? When and where was he born?] The IMS Bulletin, Vol. 17 (1988), No. 3, pp. 276-278
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
