Talk:Belief propagation

The use of factor graphs in the description of the algorithm seems weaker than the approach taken by Yedidia et al in. At least the one taken here is less clear.
 * In fact, factor graphs are more general than pairwise Markov random fields, the model used by Yedidia in the technical report to which you make reference. For instance, see a more recent technical report by Yedidia et al. in .  It's also not hard to see how belief propagation in the pairwise MRF setting can be converted to BP in the factor graph setting. Ikcotyck 01:50, 26 July 2005 (UTC)

What is this article by Lauritzen and Spiegelhalter in 1986? I only know of one in 1988 called ;Local Computation with Probabilities in Graphical Structures and Their Applications to Expert Systems"
 * Hmm.. That's the article I know as well, I can't remember why I put 1986. Ikcotyck

There is a reference to the "diameter of the tree" in the general graphs section which doesn't help since it doesn't mention trees. maddanio

It is mentioned that Belief Propagation and Sum-Product are equivalent. This is wrong! What is true is that BP can be formulated as a special case of SP. Please consult Bishop in Pattern Recognition Machine learning, page 403. Gugux (talk) 00:08, 12 March 2008 (UTC)

We have discussed this already a bit below.

Computer vision category
I removed this article from the computer vision category. BP is probably not unrelated to CV but


 * 1) It is not a methodology developed within CV or specific to CV
 * 2) There is no material in this article which relates it to CV.

--KYN 22:15, 27 July 2007 (UTC)


 * That's true. Perhaps I can work in some computer vision-related material... Iknowyourider (t c) 23:26, 27 July 2007 (UTC)

"Belief"?
Could someone explain what the word "belief" refers to in this context? Gwideman (talk) 11:14, 12 April 2009 (UTC)
 * I assume it refers to the subjectivist school of probability of using probability theory to describe a state of knowledge (the "degree of belief"). —3mta3 (talk) 10:37, 2 May 2009 (UTC)

That is correct. However, the article states that belief propagation is inherently Bayesian, which is wildly untrue. The probabilistic interpretation is irrelevant as BP is a consequence of the model itself, which is a mathematical property, not a philosophical one. —Preceding unsigned comment added by 128.42.152.184 (talk) 23:39, 19 March 2010 (UTC)

You can get interpretation of "beliefs" by looking at trees where belief propagation is exact. In that case, the message m(x)_{j->i} is the probability of observing x at node i in a tree rooted at edge i,j. In other words, each message aka "belief" represents distribution of states in the current node according to all the nodes "below it". In "loopy" belief propagation, the tree "below" current node is infinite, however this quantity can still be well-defined, in which case belief propagation converges. Yaroslav Bulatov (talk) 18:41, 12 November 2010 (UTC)

special case?
The article currently states that BP is a special case of the sum-product algorithm (and refers to an article which makes the same assertion). What other cases are there? Is the max-product algorithm for finding maximum a posteriori estimates also considered a type of BP? —3mta3 (talk) 10:42, 2 May 2009 (UTC)
 * A previous commenter referred to Bishop, p 403. Luckily this chapter is available on line. He also asserts that it is a special case (but again, without saying why). —3mta3 (talk) 10:49, 2 May 2009 (UTC)

It really depends on what people call 'Belief Propagation'. imho, the most satisfying definition is the one defining BP as the general principle of applying a 'dynamic programming' principle on a tree, obtain a recursive system of equations involving messages on the edges of a graph, and then apply these equations to any graph. That way, min-sum, max-product, sum-product are all seen as coming from the same principle (they only differ on whether we are computing the marginals/partition function of a problem or the mode/optimal assignment, and whether we are working in the log domain or not). It also removes the arbitrary distinction between pairwise model and factor graph model. It's also the definition adopted by the Jordan-Wainwright monograph, or the Mezard-Montanari book. I'll remove the reference to 'special case', but we can discuss it further if there is a big disagreement on the matter...

Matrix inversion, solving systems of linear equations, Fourier transform over finite groups and survey propagation can be implemented using the same algorithm as BP by a proper substitution of marginalization/product operators and a proper choice of graphical model. Search for "Generalized Distributive Law", Wainwright's article on Survey Propagation, and "Gaussian BP for linear equation solving" for details of these examples. Yaroslav Bulatov (talk) 18:48, 12 November 2010 (UTC)

Fuzzy logic
Is there any connection at all with fuzzy logic? just-emery (talk) 18:16, 3 June 2009 (UTC)


 * I don't think so. Belief propagation works within a Bayesian probabilistic framework, which ultimately satisfies the Kolmogorov axioms for probability theory. Fuzzy logic is, well I'm not sure that anyone quite knows what fuzzy logic is, but its proponents maintain that it isn't probability theory. —3mta3 (talk) 19:18, 3 June 2009 (UTC)


 * Not all of them. from the article Fuzzy logic:
 * "On the other hand, Bart Kosko argues that probability is a subtheory of fuzzy logic, as probability only handles one kind of uncertainty. He also claims to have proven a derivation of Bayes' theorem from the concept of fuzzy subsethood"
 * Bayesian probability as a form of fuzzy logic is what I had in mind but I see that this is disputed. I'm going to try to add a link to Belief propagation (instead of Fuzzy logic) to the article Soft-in soft-out decoder. just-emery (talk) 20:20, 3 June 2009 (UTC)


 * I guess I should have said "but its proponents maintain that it isn't JUST probability theory". —3mta3 (talk) 20:34, 3 June 2009 (UTC)

Messages and confusion
What actually is a message here? It is stated that it is a "real-valued function". So, does belief propagation correspond with propagation of "functions" over the edges of a factor graph? It is stated that there are two "types" of messages. Or they actually distinguishable? They don't seem to have different arguments, both take $$x_v$$ as argument. Then, can a message by "1", shall this be understood as the identify function? When the neighbourhood of $$N(v)\setminus\{u\}$$ is empty, then $$\mu_{v \to u}(x_v)$$ is set to the uniform distribution. That one almost makes sense. Maybe it is meant that it is set to the probability mass function, respectively the probability density function of the discrete or continuous uniform distribution, see for example the Probability mass function. A set is not denoted by X=(Xv) and the exclusion of one element of the set which has the latex \setminus is not used in: $$p_{X_i}(x_i) = \sum_{\mathbf{x}': x'_i=x_i} p(\mathbf{x}').$$.

If this entity $$\mathbf{x}'$$ is meant as a vector that enters this mass function then this is the notation of deleting one element from a vector?

From Koller and Friedman's book it is obvious that sum-product message passing does not only operate on a factor graph: https://www.coursera.org/learn/probabilistic-graphical-models

I really hope someone who cares a bit about math is taking a second look at this page. It is really hard to understand now what it all means. Even simple things like the notation of the joint probability mass function:

$$p(\mathbf{x})$$ with $$\mathbf{x}=[x_1,\ldots,x_k]^T$$ is used as a shorthand for $$p(X_1=x_1,\ldots,X_k=x_k)$$.

Andy (talk) 21:56, 18 February 2013 (UTC)
 * I quite agree all of this is not very clear. I made a small edit to at least make it clear that a message from u to v is a real-valued function whose domain is the set of possible values for the random variable associated with v. But more should be done. Tokidokix (talk) 09:15, 8 August 2013 (UTC)


 * The version of the marginal probability function from http://www.statlect.com/glossary/marginal_probability_mass_function.htm seems clearer:
 * $$p_{\mathbf{X}_i}(x) = \sum_{(x_1,\ldots,x_K) \in R_{\mathbf{X}}:x_i=x} p_{\mathbf{X}}(x_1,\ldots,x_K)$$
 * 70.166.18.67 (talk) 23:12, 10 April 2015 (UTC)
 * "Messages" are used in the context of (loopy) Belief Propagation instead of "probabilities" because they behave in many ways like probabilities but aren't strictly probabilities as the (loopy) graphs have dependencies which make the factorized form of updates used in Belief Propagation only an approximation.
 * To put it another way, Belief Propagation is exact on trees, and the "messages" in this case conform directly to "probabilities". When moving away from trees into general graphs, graphs with loops in them, we can't use "probabilities" any more because the probability function is potentially very complex and not easily written down in the sum product form used in Belief Propagation, so the term "messages" is used. The hope is that the graph is locally independent enough ("locally tree like") so that the now "messages" correspond closely to "probabilities", but they're not strictly probabilities because the (potentially very complicated) dependence of different states of the x's hasn't been taken into account.
 * In the formulations I've seen, "messages" and "belief" are forcibly normalized after every update iteration, in order to make the messages conform more closely to probabilities (presumably?). Abetusk (talk) 00:08, 19 March 2023 (UTC)

Max-product Belief Propagation
It seems strange to me that Belief Propagation would be used to only refer to the Sum-Product Belief Propagation. The Max-Product Belief Propagation has been quite studied too (and a google scholar search shows that the term "Max-Product Belief Propagation" is actually commonly used by researchers in the Graphical Model field). Furthermore it seems to me Belief Propagation is supposed to work for any semiring. If this is not controversial, I would like to refactor a bit the article to emphasize that Belief Propagation can be used with different semirings, and that the Sum-Product and Max-Product semirings are the most commonly used. This would also make the section about Viterbi more relevant, and would hopefully make everything easier to understand by showing the parallelism with the classic Hidden Markov Model algorithms (Sum-Product BP essentially reduces to the Forward-Backward algorithm in a HMM, and Max-Product BP reduces to the Viterbi algorithm). — Preceding unsigned comment added by Tokidokix (talk • contribs) 07:54, 8 August 2013 (UTC)

Write "1" instead of the "Uniform distribution"
I think that it's more correct — Preceding unsigned comment added by 2001:BF8:200:FF70:B89D:1E76:B49A:A3B2 (talk) 15:41, 26 November 2013 (UTC)

Free Energy Section
This part isn't clear at all. I'll try to write something more precise based on the work of Yedidia et al. I wonder what these sentence means: "It can then be shown that the points of convergence of the sum-product algorithm represent the points where the free energy in such a system is minimized. Similarly, it can be shown that a fixed point of the iterative belief propagation algorithm in graphs with cycles is a stationary point of a free energy approximation"

While the second one is true, the first one seems false. Except in particular cases the entropy term is usually approximated.Victolunik (talk) 22:10, 14 August 2014 (UTC)


 * The statement about "free energy" being the fixed point of Belief Propagation should be the "Bethe free energy" approximation, which *is* minimized by the message passing algorithm of belief propagation. "Generalized Belief Propagation" by Yedidia, Freeman and Weiss as well as "Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms", by the same set of authors, go into detail about the equivalence.
 * Briefly, setting up Lagrange multipliers with the constraints of the system yields the equivalence. I've found the "Constructing Free Energy ..." paper to be the more readable one.
 * Some other, potentially obvious, statements are that the Bethe free energy is exact for trees but not necessarily so in general. In general, for "loopy" graphs, the fixed points of the Bethe free energy could be local minima, so a fixed point doesn't always correspond to a global minima or solution. (One?) of the basic assumption for using Belief Propagation on a problem (successfully) is that the system is "locally tree like", meaning the "loopy" correlations don't whack out the minima of the Bethe free energy so that they correspond to "real" answers.
 * The Lagrange multipliers calculation effectively shows that gradient descent of the Bethe free energy *is* the message passing algorithm of Belief Propagation. To be a little more verbose, and as some intuition, the Bethe free energy, up to some extraneous terms, is essentially the Kullback Leibler divergence of the current guess of the probability distribution (the current set of "beliefs" or "mu"s) versus the underlying "system dynamics" (the adjacency function, ("f_a" in the article)). Abetusk (talk) 23:58, 18 March 2023 (UTC)

Gallager, 1963!
Why Judea Pearl in 1982? This was proposed by Gallager, 1963! See "Bounds on the Performance of Belief Propagation Decoding" by David Burshtein and Gadi Miller, http://www.josephboutros.org/ldpc_vs_turbo/ldpc_Burshtein_Miller_IT02.pdf. I understand, Pearl proposed the name "Belief_propagation" only ≈≈≈≈ — Preceding unsigned comment added by 188.254.126.223 (talk) 10:34, 6 April 2018 (UTC)


 * Agreed!! Gallager's algorithm also was in the general context of loopy graphs 2A01:CB00:35:C000:6D81:7F0C:15E0:A595 (talk) 21:22, 11 July 2023 (UTC)