Talk:Bayes factor

Wiki Education Foundation-supported course assignment
This article is or was the subject of a Wiki Education Foundation-supported course assignment. Further details are available on the course page. Student editor(s): Llee17.

Above undated message substituted from Template:Dashboard.wikiedu.org assignment by PrimeBOT (talk) 15:28, 16 January 2022 (UTC)

Bayesian model comparison should not redirect here
Bayesian model comparison should not redirect here. There are many ways to do Bayesian model comparison other than Bayes factors; in fact, Bayes factors are probably one of the least popular ways to do model comparison among Bayesian statisticians. See Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC as an example. Closed Limelike Curves (talk) 20:38, 10 December 2021 (UTC)

Bayes factors are less and less Bayesian every day
The claim ... use of Bayes factors is a Bayesian alternative to classical hypothesis testing is not only tired, it is misleading. Most Bayesian inference today calculates an estimate of the entire posterior distribution from an estimate or a decision space, the latter being weighted by a Loss function. For example, a test for the equality of two means calculates the Bayesian posterior of their difference, with the contributions weighted by the respective likelihoods and priors, and if the q-ROPE (or "q-Region Of Practical Equivalence") includes zero, they are considered to be the same. See Bayesian estimation supersedes the t-test as an example.

I don't mean to suggest that Bayes factors ought to be excluded, simply that they are not the only way Bayesian inference is done. This is particularly so, for Wikipedia, since Bayesian model selection now redirects here.

It's another page, but the Bayesian inference needs updating, too

bayesianlogic.1@gmail.com 21:43, 2 September 2019 (UTC)

Extension to multiple models
What is the proper extension to multiple (more than 2) models?--128.113.89.90 (talk) 15:23, 12 December 2008 (UTC)

I think section 6 of this paper covers the topic: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.143.835&rep=rep1&type=pdf 75.210.159.44 (talk) 18:46, 14 October 2010 (UTC)

Prior distributions
How do you achieve this result (solving the integral)? $$\int_{q=0}^1{200 \choose 115}q^{115}(1-q)^{85}dq = {1 \over 201}$$

Any guidance would be greatly appreciated OrangeDog 20:20, 1 September 2006 (UTC)


 * Try integration by parts 85 times and note that
 * $$\int_a^b x^{n}(1-x)^{m}\,dx = \left[ {\frac{1}{n+1}} x^{n+1}(1-x)^{m}\right]_{a}^{b} + \int_a^b {\frac{m}{n+1}} x^{n+1}(1-x)^{m-1}\,dx$$
 * --Henrygb 21:19, 1 September 2006 (UTC)


 * Sorry, I'm just not seing it. Can you just give me the general result, as I really don't have time to integrate by parts 85 times: I'm on a pencil and paper here. OrangeDog 21:57, 1 September 2006 (UTC)


 * Try doing it once, put in the numbers (look at those integration limits), and then see if you can see how things are going to turn out... Jheald 20:54, 2 September 2006 (UTC)

Ah, trapesium rules are a wonderful thing. As is the real world when you only need 8 sig. fig. OrangeDog 16:29, 3 September 2006 (UTC)


 * You don't need any approximation here. See what $$\left[ {\frac{1}{n+1}} x^{n+1}(1-x)^{m}\right]_{a}^{b}$$ is when a=0 and b=1. A pencil can work out that $$\int_0^1 x^{n}(1-x)^{m}\,dx

= \int_0^1 {\frac{x^{n+m}}}\,dx = {\frac{1}{(n+m+1){n+m \choose n}}} $$ --Henrygb 10:08, 4 September 2006 (UTC)


 * Yeah, but I had lots of similar problems (with various limits) and was looking for an exact general solution. Anyway, I'm done now OrangeDog 18:14, 5 September 2006 (UTC)

Wow. Working hard in these answers. This is an integral that you memorize in the theory of probability/statistics because it is reoccuring. Thats why people prattle off the solution without to much thought. It is called "the beta function". http://en.wikipedia.org/wiki/Beta_function. It is closely related to the beta density. Which is closely related to bernoulli R.V.s Which are closely related to ... lots... Which is why it should be memorized or atleast known to exist. Jeremiahrounds 14:33, 20 June 2007 (UTC)

Bayes information criterion
Consider mentioning approximations of integrated likelihoods and Bayes factors, and in particular linking to the article on Bayesian information criterion.

Dfarrar 02:21, 15 March 2007 (UTC)

Correspondence with p values?
If a scientist requests a 95% statistical significance level is there a corresponding value for K? More generally, is it meaningful to compare p-values with K values? Pgr94 (talk) 15:43, 4 February 2008 (UTC)


 * There is a way to do that, but you also need to know the number of degrees of freedom that have gone into computing the aggregate "K value" (discriminating information). It turns out that 2 times the discriminating (Kullback) information is asymptotically distributed as chi-square with that many degrees of freedom.  Hence you can use an inverse chi-square table or algorithm to convert it to a "probability" (that the aggregate evidence would a priori have produced at least that much weight in support of the hypothesis if it were true).  If you search the pre-Web net archives for an "i-hat" archive you may find C code I contributed long ago that computes this, along with Unix manual pages documenting its use. — DAGwyn (talk) 00:45, 16 February 2008 (UTC)

Merger proposal
This article and Bayesian model comparison both cover the same material. Opinions as to merging? -3mta3 (talk) 09:55, 27 March 2009 (UTC)

Odds ratio
It seems to me this article could be made more clear by showing how the Bayes factor appears in the posterior odds ratio for the two models. This may seem obvious to those familiar with Bayesian inference but it may be confusing to those who are not. What I mean is, insert

$$\frac{P(M_1|D)}{P(M_2|D)} = \frac{P(D|M_1)}{P(D|M_2)}\frac{P(M_1)}{P(M_2)}$$

in there somewhere and discuss it. --- Bjfar (talk) 22:15, 18 December 2012 (UTC)

Is the nomenclature in the article actually right? It seems like it is conflating the posterior odds and the Bayes factor. In my understanding, the posterior odds ratio is the Bayes factor times the prior odds, as it is described e,g, here. The article currently calls the posterior odds ratio the Bayes factor. Ast0815 (talk) 14:04, 11 February 2019 (UTC)

Example
Shouldn't the likelihood in the example section be $$q^{115}(1-q)^{85}$$ instead of $${{200 \choose 115}q^{115}(1-q)^{85}}$$? However, it won't matter for the ratio since the normalizing constant drops out. --Mguggis (talk) 05:49, 29 July 2013 (UTC)


 * @Mguggis, they are using the Binomial distribution which is appropriate. Monsterman222 (talk) 23:35, 1 August 2014 (UTC)


 * @Monsterman222, :@Mguggis references to non-publicly (free) available articles should be avoided — Preceding unsigned comment added by 31.161.255.177 (talk) 12:07, 14 February 2015 (UTC)


 * @Monstermann222, my confusion is what the distribution of interest is. Is it the sum of bernoullis or the bernoulli? If it's the sum, then the likelihood is 1 observation from a Binomial(200,p) and the likelihood in the example, $$ Pr(X_{1}+...+X_{200}|q) = {{200 \choose 115}q^{115}(1-q)^{85}}$$, is correct.  However if the bernoulli is of interest then the likelihood is the joint distribution of 200 iid observations from a bernoulli, which is $$ Pr(X_{1},...,X_{200}|q) = \prod_{i=1}^{200} Pr(X_{i}|q) = q^{115}(1-q)^{85} $$.  I was thinking the second case was the distribution of interest. Mguggis (talk) 22:35, 12 December 2015 (UTC)

Abuse of Notation
I taught this material in class and I find that students have difficulty with the abuse of notation of $$P(D | M)$$ where M is not a random variable. It is better to define a Bayesian model by the likelihood, prior and parameter region of interest:


 * $$M := $$

where the default $$\Theta'$$ is the entire support of the prior. This implies the Bayes Factor can be defined as:


 * $$ K = \frac{\Pr(D|M_1)}{\Pr(D|M_2)} = \frac{P_{M_1}(D)}{P_{M_2}(D)}

= \frac{\int_{\Theta'_1} P_{M_1}(D | \theta_1) P_{M_1}(\theta_1) \,d\theta_1} {\int_{\Theta'_2} P_{M_2}(D | \theta_2) P_{M_2}(\Theta'_2) \,d\theta_2}. $$

And the first expression, $$\frac{\Pr(D|M_1)}{\Pr(D|M_2)}$$ is admittedly an abuse of notation.

This implies the example becomes:


 * $$M_1 := $$
 * $$M_2 := $$

and thereby,


 * $$ K = \frac

{\int_0^1 {{200 \choose 115}\theta^{115}(1-\theta)^{85}} (1) \,d\theta} = \frac{0.005956...}{0.004975...} \approx 1.197 $$

You can also give an example here such as:


 * $$M_1 := H_0 : \theta \leq 0.5 \equiv $$
 * $$M_2 := H_a : \theta > 0.5 \equiv $$

and thereby,


 * $$ K = \frac

{\int_{0.5}^1 {{200 \choose 115}\theta^{115}(1-\theta)^{85}} (1) \,d\theta} $$

Thoughts?

M is not a random variable under the interpretation of frequency probabilities. However, it is a random variable under the subjective (or evidential) probability interpretation. Which is the interpretation that Bayesian methods use. Mguggis (talk) 18:08, 7 February 2017 (UTC)

Checking citations (Article critique)
Should not every unique idea in the article (definitions and examples included) end with a note? Using this article as an example, there seems to be at least one missing, as there is a footnote saying "[citation needed]" after a sentence in the middle of Definition section of the page. Additionally, while all of the links are in working order, there seems to be a lack of recent source material for this page (only about a third are published after 2000). (I hope to be able to confirm the validity of the sentence in question requiring a citation and perhaps incorporate a recent reference in a future edit of this article.) Llee17 (talk) 03:49, 27 September 2016 (UTC)

regardless of whether these models are correct
In the intro, it is stated that "The aim of the Bayes factor is to quantify the support for a model over another, regardless of whether these models are correct." I think the second part of this sentence is contentious, or at least it should be better explained what is meant by that. Consistency proofs of the BF require the true model to be in the set, which is why many introductory texts drop a note about the BF requiring the true model to be in the set for consistency. It's not clear to me that "support" would even mean for two wrong models, technically (how would I prove the method performs correctly), and epistemologically. In the case of wrong models, AIC has a more straightforward definition via KL divergence. — Preceding unsigned comment added by 132.199.61.137 (talk) 17:17, 14 November 2017 (UTC)

Fixed a error or typo in the Definition section
Hello,

I was reviewing this page, and I noticed what I thought was an error. In the old revision (revision id 880926160), the Definition section included the following text: The posterior probability $$\Pr(M|D)$$ of a model M given data D is given by Bayes' theorem:


 * $$\Pr(M|D) = \frac{\Pr(D|M)\Pr(M)}{\Pr(D)}.$$

The key data-dependent term $$\Pr(D|M)$$ is the likelihood of the model M in view of the data D, and represents the probability that some data are produced under the assumption of the model M; evaluating it correctly is the key to Bayesian model comparison.

However, $$\Pr(D|M)$$ is not "the likelihood of the model M in view of the data D"; it is instead the likelihood of the Data D in view of the Model M, as is described in the later part of that sentence. The term which is "the likelihood of the model M in view of the data D" is $$\Pr(M|D)$$. Thus, I edited out the erroneous first part of this sentence. (See the diff here.)

If I am interpreting the page or the probability notation incorrectly, someone can revert this change.

Thebrainkid (talk) 06:21, 18 February 2019 (UTC)

Hypothesis testing, Bayesian approach, and frequentist statistics.
The following paragraph from the 'Interpretation section' has been tagged. "The use of Bayes factors or classical hypothesis testing takes place in the context of inference rather than decision-making under uncertainty. That is, we merely wish to find out which hypothesis is true, rather than actually making a decision on the basis of this information. Frequentist statistics draws a strong distinction between these two because classical hypothesis tests are not coherent in the Bayesian sense. Bayesian procedures, including Bayes factors, are coherent, so there is no need to draw such a distinction. Inference is then simply regarded as a special case of decision-making under uncertainty in which the resulting action is to report a value. For decision-making, Bayesian statisticians might use a Bayes factor combined with a prior distribution and a loss function associated with making the wrong choice. In an inference context the loss function would take the form of a scoring rule. Use of a logarithmic score function for example, leads to the expected utility taking the form of the Kullback–Leibler divergence."

It would need clarification and sourcing. Manudouz (talk) 16:14, 8 May 2019 (UTC)


 * This seems to me to be a highly inappropriate way to write an article. We shouldn't just add arbitrary passages with no sourcing whatsoever and tag them as needing "clarification and sourcing." Information should only be included in the first place if there is evidence that the information is correct. That is not the case here. In fact, there is information here that is manifestly false--even nonsensical. For example, consider the first sentence: "The use of Bayes factors or classical hypothesis testing takes place in the context of inference rather than decision-making under uncertainty." The claim that classical hypothesis testing cannot be used in decision making is absurd because the classical Neyman-Pearson framework uses an alpha level as the decision boundary (e.g., see the Wiki article on "statistical hypothesis testing"). Moreover, the claim that Bayes factors are not used in decision making is contradicted just a few sentences later in the paragraph: "For decision-making, Bayesian statisticians might use a Bayes factor combined with a prior distribution and a loss function associated with making the wrong choice." I suggest either (a) removing the paragraph entirely or (b) heavily revising it and providing reputable sources for any claims that are made. 23.242.198.189 (talk) 20:24, 8 May 2019 (UTC)


 * Update: I've deleted the offending passage. 164.67.15.110 (talk) 19:34, 15 May 2019 (UTC)


 * It would help readers if you might go ahead with your (b) suggestion, and point to sources related to the claim(s). Wikipedians could then adjust the paragraph accordingly rather than removing it completely. Manudouz (talk) 07:16, 23 May 2019 (UTC)

Option (b) is only possible if the information in the paragraph is correct and merely needs to be sourced. In this case, the information appears to be incorrect. In fact, as pointed out above, the material in the paragraph ISN'T EVEN INTERNALLY CONSISTENT. With regard to incorrect information, any objection to "removing it completely" is absurd--shouldn't incorrect statements be removed? Moreover, if a particular statement is in dispute, then the burden of providing sources should be on those who advocate INCLUDING the disputed statement. Otherwise, editors could add any nonsense they wished and demand that it be allowed to stand until proven wrong. I could add a claim that Thomas Bayes' favorite ice cream flavor was strawberry, and insist that it be included until someone hunts down a source explicitly saying otherwise. Simply put, we have NO evidence that the information in the paragraph is correct, so there is NO legitimate basis for reinserting it again and again after it has been appropriately removed.23.242.198.189 (talk) 04:40, 27 May 2019 (UTC)

From wikipedia policy(https://en.wikipedia.org/wiki/Wikipedia:Verifiability): "The burden to demonstrate verifiability lies with the editor who adds or restores material, and it is satisfied by providing an inline citation to a reliable source that directly supports the contribution...Any material lacking a reliable source directly supporting it may be removed and should not be restored without an inline citation to a reliable source." (emphasis in original) The policy does also state that "In some cases, editors may object if you remove material without giving them time to provide references." However, in this case, weeks have gone by without any references being supplied, and moreover, the information appears to be not only unsourced but also incorrect and self-contradictory.2602:306:CDB9:CF00:4D43:1F0:2270:821 (talk) 21:20, 2 June 2019 (UTC)

For the reasons outlined above, I intend to remove the contested material. To recap, the material (a) is unsourced, (b) appears to be wrong, (c) is not even internally consistent, and (d) has been allowed to stand for ample time (a month) without any editors coming forward to defend or provide even a single source for the material in the paragraph. Wikipedia policy therefore DEMANDS that the material be removed and expressly PROHIBITS the material's reinsertion without sourced justification (see (https://en.wikipedia.org/wiki/Wikipedia:Verifiability). 23.242.198.189 (talk) 00:40, 8 June 2019 (UTC)


 * For what it's worth, here's a recent pedagogical article that you might like to consider, presenting how classical hypothesis tests can lead to results that Bayesians would regard as not coherent, in contrast to a Bayesian analysis: Jheald (talk) 01:00, 8 June 2019 (UTC)

The paper Jheald cited states that "The same conflicting conclusions result from the calculation of Bayes factors." In fact, the article gives examples of how both p-values and Bayes factors have the capacity to produce incoherent results in multiple-testing situations: "From these examples, one concludes that both UMP tests and GLR tests with the same level of significance can yield incoherent results. The same happens when performing tests based either on Bayes factors or on p-values." Thus, the cited paper provides further evidence that the contested paragraph is incorrect in claiming that frequentist analyses are incoherent whereas Bayes factors are coherent. 23.242.198.189 (talk) 19:21, 8 June 2019 (UTC)


 * Considering Bayes factors alone (ie without a consistent prior) can be incoherent. However, the article concludes by showing "how a Bayesian decision-theoretical framework" (ie using "a Bayes factor combined with a prior distribution and a loss function" in the language of the paragraph above) "can be used to build coherent tests".


 * This would be useful content to present and explain in more detail in the article. Jheald (talk) 14:40, 9 June 2019 (UTC)

Perhaps a section on how Bayes factors are used in decision-making is warranted. But the disputed paragraph does not serve that function. Indeed, it appears we are in agreement that the content in the paragraph is incorrect: "classical hypothesis tests are not coherent in the Bayesian sense. Bayesian procedures, including Bayes factors, are coherent." 23.242.198.189 (talk) 21:10, 9 June 2019 (UTC)


 * Happy to see that changed to "classical hypothesis tests are not coherent in the Bayesian sense. Bayesian procedures, including Bayes factors if combined with appropriate priors, are coherent." Jheald (talk) 19:11, 10 June 2019 (UTC)

That's still not correct. The article Jheald cited does note that some classical procedures can produce "incoherent" results in certain multiple-testing situations. However, the it also notes that there are some classical procedures that do not produce incoherent results. Thus, this isn't a classical-vs.-Bayesian issue. Note also that coherence in this sense is strictly a multiple-testing issue; no test procedure is "incoherent" in itself. 23.242.198.189 (talk) 00:49, 13 June 2019 (UTC)

Proof that example is wrong.
Look at this integral:

https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&campaign=anoneditwarning&returnto=Talk:Bayes_factor
 * $$ \int_{0}^1 {{200 \choose 115}q^{115}(1-q)^{85}}dq.$$

Let us set x=115, y =85


 * $$ \int_{0}^1 {{x+y \choose x}q^{x}(1-q)^{y}}dq = {{x+y \choose x} \int_{0}^1 q^{x}(1-q)^{y}}dq$$


 * $$ {x+y \choose x} \int_{0}^1 q^{x}(1-q)^{y}dq={x+y \choose x} B(x+1,y+1)=\frac{ {x+y+2 \choose x+1} } \frac{x+y+2}{(x+1)(y+1)}$$
 * $$ \frac{ {x+y+2 \choose x+1} } \frac{x+y+2}{(x+1)(y+1)}= \frac{ {x+y \choose x}(x+y+1) } =\frac{1}{x+y+1} $$

Seeting x=115, y= 85 we obtain $$ \frac{1}{201}$$ as example says. But it does not depend on x alone, only on x+y. So even if x is 200 and y is 0 this is exactly the same number, so this test tests nothing useful. Krystianzaw (talk) 23:52, 1 January 2022 (UTC)


 * Your calculations and those in the article show that if the prior for q in M2 is uniform on [0,1] then all the possible outcomes from 0 through to x+y would be equally likely to be observed (you describe this as "does not depend on x alone").
 * So it is reasonable to compare this likelihood with M2 to the likelihood of the actual observation in M1, and it avoids the dubious practice in a classical likelihood-ratio test with a composite alternative hypothesis of largely ignoring the breadth of the alternative hypothesis and instead using a particular alternative chosen after the data has been observed.10:14, 14 April 2022 (UTC)