Talk:Bayesian inference/Archive 2

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1 Archive 2

Bayes theorem?

It is misleading to say that Bayesian statistics is based on Bayes theorem. The crucial point of Bayesian reasoning is that we are treating our hypothesis as a random variable, and getting the average expectation based on all values of H. The relevant rule is the Law of total probability, as we are predicting new observations on the basis of old observations:

With summations replaced with integrals for continuous h.

While Bayes rule is often used in Bayesian models, it is not what makes a model Bayesian. Bayesian reasoning (averaging over hypotheses) and Bayes rule simply happen to have been discovered by the same person, Thomas Bayes. What say the editors? Raptur (talk) 23:26, 5 February 2012 (UTC)

I don't see where the article says "is based on", it is more "uses". What you say above looks reasonable, but can you find a "reliable source" taking the same approach? You also need to consider the article Bayesian probability, which goes into Bayesain interpretation of probability. (That is, finding the most appropriate place to include your points). Melcombe (talk) 15:15, 6 February 2012 (UTC)
In the "Philosophical Background" section, it says that Bayes Rule is the "essence" of Bayesian inference, which is not the case. Raptur (talk) 22:02, 13 February 2012 (UTC)
I do agree that "the essence of" might be misleading when conceptualising Bayesian inference as modifying a distribution. However, as explained below, I do think it is right to begin with this "building block" before moving onto the bigger picture. I've changed "the essence of" to "the fundamental idea in", catering better for all views? Gnathan87 (talk) 03:17, 24 February 2012 (UTC)
What the Bayesian inference article describes is in fact based on Bayes' theorem, for the most part involving the use of new evidence to update one's likelihood estimate for the truth of a single fixed hypothesis, not involving any form of averaging over hypotheses. Bayesians such as I. J. Good have emphasized this in their published writings. The sum over hypotheses shown in the article is just a normalizing factor and is independent of which particular "M" has been selected. What you describe may be some related topic, but it is not what this article is discussing. — DAGwyn (talk) 22:39, 8 February 2012 (UTC)
No. See equation (3) from this paper on Bayesian Hidden Markov Models, equation (1) from this field review, equation (9) from this paper on a Bayesian explanation of the perceptual magnet effect, or Griffiths and Yuille (2006) (some lecture notes available here). Bayesian statistics is about computing a joint probability over observations and hypotheses. Of course, once you have a full joint, you can easily compute any particular conditional probability, including the one computed by Bayes Rule. Hidden markov models, to use the example from the first paper I provided, use Bayes rule in computing marginals over the hidden variables. Parameter estimation (notably the Expectation-maximization algorithm) for Hidden Markov Models, however, often takes only a Maximum Likelihood Estimate rather than estimating the full joint over model parameters (and data). Thus, an HMM trained with the expectation-maximization algorithm uses Bayes rule and changes its expectations after seeing evidence ("training data" in the parlance of the paper), but it is not Bayesian because it maintains no uncertainty about model parameters. Raptur (talk) 22:02, 13 February 2012 (UTC)
None of these 3 sources say anything like "Bayesian inference is ....". I don't see them mentioning inference at all. Instead they talk about "Bayesian model averaging" and "Bayesian modeling" ... which you might think amount to the same thing. But do find something that both meets WP:RELIABLESOURCES and is explicitly about "Bayesian inference". On what you said in the first contrib to this section ... this seems to correspond somewhat to predictive inference. Melcombe (talk) 00:46, 14 February 2012 (UTC)
I've been avoiding this source, since I don't have an electronic version of it, but [http://www.amazon.com/Bayesian-Analysis-Chapman-Statistical-Science/dp/158488388X/ref=sr_1_1?ie=UTF8&qid=1329222331&sr=8-1 "Bayesian Data Analysis: Second Edition" by Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin] opens with, on page 1, headed "Part 1: Fundamentals of Bayesian inference": "Bayesian inference is the process of fitting a probability model to a set of data and summarizing the result by a probability distribution on the parameters of the model and on unobserved quantities such as predictions for new observations." This is exactly the statement that Bayesian inference is about maintaining probability distributions over both data and models. On page 8, it says: "The primary task of any specific application is to develop the model and perform the necessary computations to summarize in appropriate ways," where comprises our model parameters and comprises our data. The probability of a model given data is an important part of Bayesian inference (indeed, it's the second term of the marginalization expression), but what distinguishes Bayesian inference from other statistical approaches is maintaining uncertainty about your model.
I should point out that the other papers I provided do take this approach. The Bayesian HMM paper says "In contrast \[to a point estimate such as Maximum Likelihood or Maximum a posteriori\], the Bayesian approach seeks to identify a distribution over latent variables directly, without ever fixing particular values for the model parameters. The distribution over latent variables given the observed data is obtained by integrating over all possible values of :" and presents the object of computation as . The perceptual magnet effect paper says on page 5 that listeners are trying to compute the posterior on targets, and on page 6 says that this quantity is computed by marginalizing over category membership: . The lecture notes I referred to give, in section 6 titled "Bayesian Estimation," precisely the derivation I opened with, and explicitly contrast Bayesian inference with point estimates of model parameters. — Preceding unsigned comment added by Raptur (talkcontribs) 12:28, 14 February 2012 (UTC)
The single hypothesis vs. joint distribution thing is something that has been disputed for some time (and in fact was the subject of comment from Gelman himself). My personal conclusion is this: Bayesian inference is, in practice, used most often by scientists and engineers, who tend to use it exclusively over distributions. In this context, it makes sense to think of inference as being fundamentally something you do on a joint distribution. However - Bayesian inference in philosophy of science tends to be expressed in terms of single hypotheses.
As Bayesian inference is an important topic in philosophy of science, and Bayes' theorem is in any case the building block of the joint distribution view, I certainly think it is appropriate to begin with an explanation of the single hypothesis view before covering distributions. Also, applications in a courtroom setting, discussed later in the article, have to date been the single hypothesis view (as seems appropriate in that context), so it should at least have been mentioned. A final reason to cover the single hypothesis view is because, at least in my view, it is more accessible and insightful for those new to the topic than charging ahead to joint distributions. Gnathan87 (talk) 21:05, 23 February 2012 (UTC)
This is an article on mathematical statistics, so it seems strange to defer to terminology from the philosophy of science or criminal law. I do think a brief summary or a link to an article discussing the Bayesian view of probability is warranted. Scientists, statisticians, and engineers often use (what this article calls) "Bayesian Inference" without averaging over models (as in conditional random fields and MLE Bayes nets, which all use Bayes' theorem), but they don't call it "Bayesian Inference."
And again, Bayes' theorem is not the defining building block of the joint distribution view; the rule of total probability is. Bayes' theorem is just the easiest way to compute the second term of the marginalization expression. Finally, I don't think it's a good idea to focus on a topic just because it's easier if it's actually only partially related to the title of the article. This misleads readers (including students from a course for which I am a TA) into thinking "Bayesian Inference" is inference using Bayes' Theorem, when really it is inference that appreciates subjective uncertainty about your model. Raptur (talk) 15:30, 26 February 2012 (UTC)
I know this is an old discussion, but Raptur is entirely correct here, and the Gelman characterization of Baysian inference is far far better than the current introduction. In fact, I would go as far as to say that the current into is simply wrong. Bayes theorem is often important in Bayesian inference but the way the first sentence is phrased implies Bayes theorem is the basis of it, when that simply is not the case. Also, the intro confuses the idea of "Bayesian updating" (without even really defining it, although I can guess what the author means) with Bayesian inference. I have yet to read the whole article. but the intro hardly instills confidence in what comes below... Atshal (talk) 16:49, 14 November 2012 (UTC)
I just want to emphasize Atshal's point that the "Bayesian updating" and rationality stuff is really misleading as currently written. After reading up to that section, one would have the impression that Bayesian inference is the application of Baye's rule, and that there is something controversial about Baye's rule. There is nothing whatsoever controversial about Baye's rule; Bayes' rule is derived directly from the definition of conditional probability using very simple algebra. The controversy over Bayesian inference stems from the treatment of hypotheses as random variables rather than as fixed, and over the necessary use of a prior. Raptur (talk) 15:37, 20 November 2012 (UTC)
Ugh, the very first section is "Introduction to Bayes' Rule". Why is it not "Introduction to Bayesian Inference", since this article is about Bayesian inference and not Bayes' Rules? Atshal (talk) 16:52, 14 November 2012 (UTC)

Rationality

All this stuff about Bayesian inference being the model of rationality is ignorant nonsense.

I clarified this more than a year ago, with in-line cites to the highest quality most reliable sources, and the clarification remains in the article, which I suppose was intolerable.

I don't understand how people can write an article that is shown to be patent nonsense by the later discussion in the same article.  Kiefer.Wolfowitz 17:11, 26 February 2012 (UTC)

The article has now been changed, but I would mention that nowhere did previously it state that Bayesian inference was "the" model of rationality. The previous lead read "how the degree of belief in a proposition might change due to evidence." and that it is "a model of rational reasoning". Later on, we have "The philosophy of Bayesian probability claims". None of these suggest that Bayes' theorem is the sole theory of rationality. It was simply keeping the information relevant to the article, rather than expanding on other possible theories and techniques. Maybe the issue is that this simply was not sufficiently emphasised. Gnathan87 (talk) 19:57, 26 February 2012 (UTC)
Before my edits, the article's lede asserted uniqueness of Bayesian updating as the rational system.  Kiefer.Wolfowitz 21:26, 26 February 2012 (UTC)
What the article said is that the philosophy of Bayesian probability asserts that this method of updating is the rational one. The distinction seems clear to me. Removing citation needed tag, obviously it is rational in the sense that sophisticated reason underlies the decision.73.2.136.228 (talk) 00:19, 2 October 2014 (UTC)

Updating

In the end I can only find arguments of why Bayesian updating is not the only rational rule, but I am still left wondering what bayesian updating actually is. Could anyone please fill the section with appropriate content? Thanks Elferdo (talk) 14:00, 14 October 2015 (UTC)

Section "Multiple observations" - conditional independence vs independence

In the current version (17 Jun 2016), the section "Multiple observations" requires "a sequence of independent and identically distributed observations" while the key to combine multiple observations is conditional independence as confirmed two lines below by the equation . Please notice that neither implies nor is implied by . --155.245.65.27 (talk) 09:54, 17 June 2016 (UTC)

External links modified

Hello fellow Wikipedians,

I have just modified 3 external links on Bayesian inference. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 04:06, 29 October 2016 (UTC)