Talk:Bayesian network

history
Wasn't there some big history to bayesian networks? They were one of the first statisical processing methods invented? no!

The article as it stands (2003/12/26) limits the definition unnecessarily. I'm going to edit the article to address these points: (1) a node can represent any kind of variable, not just discrete random variables; variables need not be discrete, and they need not be random. (2) the arcs don't represent correlation; correlation in probability theory has a certain well-defined meaning which is not applicable here. What arcs do represent is conditional dependence. (3) "Conditional probability table" assumes that the variables involved are discrete; need to allow for continuous variables. (4) The list of applications can be expanded.

I've addressed (or tried to) items (1) through (3) above. Wile E. Heresiarch 07:41, 27 Dec 2003 (UTC)

An example would be very helpful in this article. Banno 01:05, Jul 7, 2004 (UTC)

learning
It might be interesting to put some comments about the learning of the BNs

--Response: I did this implicitly by saying that distributions could be parameterized and discussing the use of EM (expectation-maximization) in estimating these parameters from data. Also, there is already a paragraph on learning the structure of BN's.  I think this covers the fundamental "learning" areas, though the section on parameters and EM could be expanded a bit.

Dynamic Bayesian network
I was disappointed to see the DBN link redirect back to BN. In this case someone needs to write a section discussing the specialization to DBN's, and the special cases of hidden Markov models, Kalman filters, and switching state-space models, and their applications in tracking and segmentation problems

Way, way, wayyyyy too technical
No doubt it's a wonderfully accurate and concise explanation exactly what a Bayesian network is but it's useless to those of us who aren't striving for that PhD in math. At the very least it needs an introduction stating, in plain english, what such a network is. I can't be the one to write that introduction because after reading the article I have less of a clue about what they are then before I read it. The sort of detail in this article is great as a stand-alone webpage or reference, but it's not an encyclopedia entry. Frustrating.
 * I added a small hopefully more human readable 3 sentences to the introduction... --UmassThrower 04:59, 23 March 2007 (UTC)

Does anyone ever read this page who /isnt/ striving for a PhD in math...?
 * It might, at the least, be helpful to add some links/references to what the various pieces of mathematical notation mean/refer-to. I have a CompSci/Math degree, but I'm still scratching my head here trying to figure out what the formulae actually mean. -- 91.123.228.33 (talk) 12:17, 21 May 2012 (UTC)
 * I think an overview of the practical applications in the lead paragraph would help out a lot. Currently the lead section gives very little context on the subject (i.e. what are bayesian networks and why are they important?) WP:LEAD and Make_technical_articles_accessible have some pointers on this. It might also be interesting to mention the origin in the lead section. Bayes' theorem is given only passing mention in the article currently, maybe more info on this would be useful. Aside from the lead section, linking jargon words throughout the article could help readers in understanding the material.--Eloil 19:58, 17 April 2007 (UTC)
 * Completely agree !! This type of technical level is ONLY enjoyed by experts, not the general public, who are no wiser ! —Preceding unsigned comment added by 202.53.236.171 (talk) 04:56, 25 April 2008 (UTC)
 * Also agree, it should be little be more friendly to others —Preceding unsigned comment added by Dsundquist (talk • contribs) 18:16, 20 December 2008 (UTC)

Graphical?
Seems like a graphical tool should use some graphics... It would be a lot easier to understand with one of the example networks from the reference material. For example, Murphy's Fig. 1.

Bayesian aspect
Currently the article says
 * [Variables] are not restricted to representing random variables; which forms the "Bayesian" aspect of a Bayesian network.

I don't think that this is the cause for the network being "Bayesian". Surely also non-Bayesian models can have non-random variables (random variables whose probability distribution is concentrated on one value)? I would say that the name comes from the fact the Bayes' theorem is used to update the node probabilities after observing some values. AnAj 20:22, 24 March 2007 (UTC)

I'd actually like to hear what the general opinion is on this. Often, BNs are used to calculate probabilities of what a frequentist would call hypotheses. This is a no-go in the frequentist world, and the hallmark of the Bayesian way of looking at things, hence leading to the name 'Bayesian network'. Does this make sense? If not, why not? Tomixdf 21:23, 24 March 2007 (UTC)

Here's a quote from the Wikipedia entry on Bayesian probability that might clarify things:

The difference between Bayesian and Frequentist interpretations of probability has important consequences in statistical practice. For example, when comparing two hypotheses using the same data, the theory of hypothesis tests, which is based on the frequency interpretation of probability, allows the rejection or non-rejection of one model/hypothesis (the 'null' hypothesis) based on the probability of mistakenly inferring that the data support the other model/hypothesis more. The probability of making such a mistake, called a Type I error, requires the consideration of hypothetical data sets derived from the same data source that are more extreme than the data actually observed. This approach allows the inference that 'either the two hypotheses are different or the observed data are a misleading set'. In contrast, Bayesian methods condition on the data actually observed, and are therefore able to assign posterior probabilities to any number of hypotheses directly. The requirement to assign probabilities to the parameters of models representing each hypothesis is the cost of this more direct approach.

Tomixdf 21:29, 24 March 2007 (UTC)


 * Bayes Rule is a way to calculate relationships between variables' values in a joint probability distribution. Think of a JPD as a histogram of all possible value sets. This is a power set! In other words, the JPD requires exponentially large amounts of memory space. But Bayes Rule lets you compute whatever parts of that JPD that you need to get the probability of your target hypothesis, aka the posterior probability.


 * Trouble is, you often don't have data to get directly from evidence to target hypothesis. So you need to go through a (possibly) large network of variables and values to get there. This means you are necessarily APPROXIMATING the values that would be in a full JPD. So if you construct the network carefully and have good data, your approximation may be quite good. It also may be quite bad.


 * I suspect a tutorial desciption somewhat like the above (maybe even simpler) needs to go in this article.
 * --ClickStudent (talk) 21:12, 25 May 2008 (UTC)

reference?
do we have a reference for this? It directly contradicts Pearl. "A Bayesian network need not to represent causal relationships. However, if knowledge about causality is available it is natural to use it for selecting the parent variables when building the network, thus resulting in causal Bayesian networks." MisterSheik 21:18, 27 March 2007 (UTC)


 * In addition to Pearl's causal model Bayesian networks can be interpreted as (non-causal) models of probabilistic relationships. See Heckerman: A Bayesian approach to learning causal networks for a discussion of the two interpretations. AnAj 17:47, 29 March 2007 (UTC)

more incorrect information
"Because the joint distribution can be decomposed into product of local distributions in any order, there is no unique Bayesian network for a given distribution."

I simplified this (check page history for original version) but it's not true either. That's not the reason that there's no unique Bayesian network, because you cannot, for example, reverse a causal link just by calculating the conditional probability backwards. I think the person writing this has gotten this idea from Bishop, but he doesn't explicitly say that, and anyway it's not true (due to re-decomposition of products of local distributions-- it is true that you can add in more nodes and reorganize the network.) MisterSheik 21:39, 27 March 2007 (UTC)


 * Suppose we have a joint probability distribution p(x,y). It can be written either as p(x,y) = p(x|y)p(y) or p(x,y) = p(y|x)p(x). The first correspondes to a Bayesian network, where y is the parent of x, and the second corresponds to a network, where x is the parent of y. AnAj 17:53, 29 March 2007 (UTC)


 * I did some research, and I realize that there are formulations of Bayesian networks that aren't causal, but Pearl invented these networks, and he defines them as being causal. In that case, the directed arrows are not just conditional probabilities, but rather direct causal links, which cannot usually be reversed.  Pearl devotes a whole chapter to determining the directions of the links in prob. reas. in int. sys.


 * In other words, the directed arrows do not just store conditional probability information, but primarily they are used to reason about independence.


 * My suggestion is that we have a different page for the other directed graphical models that you are describing. MisterSheik 22:17, 29 March 2007 (UTC)


 * I think both viewpoints can be discussed in the same article. See my recent edits. AnAj 17:11, 1 April 2007 (UTC)

looming edit war?
I can see from the recent history that an edit war may be looming. In defence of my recent changes to the definition, I want to reaffirm on the talk page that links don't represent statistical independences, but direct causal relationships. Two nodes can be dependent without being connected (for example, if they are results of the same cause.) These nodes become independent once that causal variable is known. Similarly, they are made dependent by an effect that becomes known (or if its effect is know, and so on). This is what should be explained under d-separation, which really shouldn't have its own article, but should be explained here.

d-separation is the whole point of Bayesian networks--not efficient computational properties. (Perhaps the computations are easier for students to grasp and so presented first by instructors?)

Tractability is also not an important point because the conditional probability of a variable on its parents could be intractable.

MisterSheik 21:54, 27 March 2007 (UTC)

OK, the casual section has been round the Oxford Pattern Analysis and Machine Learning group and we think it's OK -- though will always be vunerable to weasel words from opponents in the eternal Machines Learning vs AI vs Stats debates (see that pile of doggy doo over there, that's AI that is, that's AI's mum...) -- Charles Fox 2007-Apr-10

N or n in "Definition" section?

 * In the Definition section, are "N" and "n" used interchangeably?--Skoch3 16:51, 29 March 2007 (UTC)


 * Yes. It's fixed now. AnAj 17:58, 29 March 2007 (UTC)

Good start AnAj
I think the article is getting better, but I still think we need to cut it right in two. The definition makes it sound like there the causal thing is some funky requirement that you can drop. The reality is that the causal version is completely different: the operations that you can perform on it work differently. For example, checking to see if two nodes are independent. In the Pearl's "version", d-separation (a path, with specific requirements--this is the magic of causal relationships) is the procedure that determines independence, but without the notion of causality, you're back to m-separation (any path). The belief updating (what's been labeled inference) is done with likelihoods and priors in pearls version; i'm not sure how it's done otherwise. Also, the notion of rewiring the network-- in fact, the whole meaning of network has changed. The introduction should really reflect the reality that there are in fact two separate concepts. They are as different as markov networks and bayesian network, pearl's bayesian networks and the other "directed graphical models". I'm okay with including them in the same article since that reflects the reality of how people use the term, but I think they should be presented in a more side-by-side fashion. MisterSheik 21:11, 1 April 2007 (UTC)

Terrible
This new version (with two definitions of a BN) is clearly incorrect. I have Causality by Pearl lying in front of me: Pearl clearly distinguishes between Bayesian Networks and Causal Bayesian Networks. Also, in the current version d-separation only seems to apply to Causal BNs, which is clearly wrong. In short, I think we should revert to one of the earlier versions. Tomixdf 06:15, 17 April 2007 (UTC)

I have Probabilistic Reasoning in Intelligence Systems lying in front of me, which does not distinguish between two kinds of Bayesian network, but rather defines Bayesian networks to be causal. If you define Bayesian networks to use the joint probability distribution, then d-separation doesn't apply, and belief updating is different.

For example, given a network A->B<-C, let B be known. Then, are A and C independent? In a causal network, no. But, if we just have non-causal conditional probabilities, then A and C are independent.

MisterSheik 14:36, 17 April 2007 (UTC)

"Probabilistic Reasoning in Intelligent Systems" is 20 years old - I havn't read it, but if what you say is true, the book is clearly outdated. "Causality" (also by Pearl) is from 2000, and does make the distinction between causal and non-causal BNs, right from the beginning. The current description is totally misleading, to say the least. Tomixdf 09:09, 18 April 2007 (UTC)

It is 20 years old. Still, d-separation doesn't apply to a Bayesian network that's defined using just a joint probability distribution as in the second case. Perhaps, Pearl's non-causal Bayesian network is yet a third case? I'm going to find Causality at the library and get back to you some time next week. You're welcome to copy Pearl's definition of causal and non-causal Bayesian networks into the talk page though. MisterSheik 11:14, 18 April 2007 (UTC)

No, you are quite wrong. d-separation most certainly is relevant for ordinary BNs. d-seperation is basically used to determine the (conditional) independencies associated with a DAG, and for determining if two DAGs are observationally equivalent. In "Causality", d-sep is introduced for general BNs on page 16. Causal BNs only appear on page 21. In conclusion, I strongly feel we need to revert to one of the older versions of the page. We can then add a section on causal BNs, or even a separate page. Tomixdf 16:30, 18 April 2007 (UTC)

It's quite possible I'm wrong.

I said that d-separation doesn't apply to a bayesian network that's defined using just a joint probability distribution. If you are going take that joint probability distribution and arbitrarily assign directions to the arrows, which represent causes or information flow (I noticed that Pearl uses this terminology in some recent papers, so I bet it's the same kind of thing in Causality), then you can apply the concept of d-separation. That is, you need the joint pdf and the concept of a direction of causality or information flow.

I'm going to get Causality tomorrow or on Friday, but it will take me at least a few days to read it. Still, the old presentation before the current version wasn't much better because it made it seem as if you could take any joint pdf, create a bayesian network, and then apply d-separation, expecting meaningful results. But, if the direction of the links is arbitrarily chosen, how can you expect that?

Instead of reverting, what we should do is take, for example, "Causality" and import its presentation structure and key points. Pearl has thought all those things through for us. The old presentation was much worse because it obscured, or maybe completely missed, causal reasoning.

MisterSheik 19:56, 18 April 2007 (UTC)

What do you mean with "a bayesian network that's defined using just a joint probability distribution"? If you want to express a probability distribution as a BN, you need to come up with a DAG that is compatible (or more precisely: Markov relative) to that probability distribution. In order to judge if two DAGs are compatible with the same set of probability distributions (observational equivalence), d-separation plays a central role. One would not just assign random directions to the edges!!

You are going to read Pearl's "Causality" in a few days? I have deep, deep respect for you :-D Tomixdf 20:27, 18 April 2007 (UTC)

Ha ha :) I was exaggerating about a few days :)  But, I'll give it a shot. I have a lot of time and enthusiasm: I managed to read that 20-year-old book in about 2 weeks (except for the final few chapters that weren't interesting to me.)  I'm hoping that there will be a lot redundancy between the two books.

Anyway, it's obvious that I have a lot to catch up on, but are we agreed that we should make the page look more like this book? Please take a look at the old page. It says: "Because the joint distribution can be decomposed into product of local distributions in any order, there is no unique Bayesian network for a given distribution." This makes it sound as if you can "just assign random directions to the edges", no? Also, in my book the belief in a node is not just the product of the priors coming from the parents, unless those parents are independent, right? MisterSheik 21:16, 18 April 2007 (UTC)

Consider A->B->C and A<-B<-C. These BNs are observationally equivalent (compatible with the same set of probability distributions). But A->B<-C is NOT observationally equivalent to the latter BNs, because it has a different set of v-structures (converging arrows whose tails are not connected). That there is no unique BN for a given prob. dist., does not mean that "anything goes".

Chapter one from "Causality" has a lot of the stuff we need to put on the page, yes. Tomixdf 06:38, 19 April 2007 (UTC)

I've reverted to the last reasonable page I could find - the previous page was clearly beyond repair, and it was high time that the erroneous info was removed, I think. I still think we should add some info from Pearl (2000) - chapter 1. Tomixdf 20:54, 22 April 2007 (UTC)

To do
Things that should/could be added: Tomixdf 22:09, 22 April 2007 (UTC)
 * Observational equivalence, d-sep and v-structure + plus a simple example of what it all means
 * Some more information on Causal BNs, or even a seperate page
 * Inference using Gibbs sampling
 * Link with factor graphs
 * Variational methods, free energy interpretation

There was a separate page for d-separation which now redirects to the main Bayesian network article. Why was it removed? There's scarcely any mention of d-separation in the main article. A separate article with a visual description (like on Kevin Murphy's page) would be great. Would it be possible to resurrect the deleted page and work from there?

Silya 00:07, 9 August 2007 (UTC)

Relationship to automata
On 13:01, 17 May 2007 Tomixdf (Talk | contribs) (16,635 bytes) (Undid revision 131546816 by Linas (talk) In a BN, edges are NOT state transitions. This section is thus confusing.)

performed a revert of the following:


 * Related concepts
 * A related concept is that of the topological finite state machine, of which the quantum finite state machine is a special case. Here, nodes represent internal states of the machine, while arrows are interpreted as functions from one topological space to another. The resemblance is closest when the topological spaces are probability spaces or measure spaces, in which case the input string to the automaton corresponds to a specific path through a Bayesian network.

This paragraph never said that the edges of a BN are state transitions. I am not sure why you found this confusing; perhaps some additional word-smithing is in order? (This was meant to be a quick-n-dirty remark, not an exposition). linas 01:08, 18 May 2007 (UTC)


 * In the topological finite state machine you describe, the nodes seem to be states and the edges denote state changes. This would be similar to an HMM state diagram, but not to a BN diagram, in which the nodes are variables and the edges dependencies. I suspect you are interpreting the BN diagram as an HMM state diagram, and as a result this section would be better suited for the HMM page. Tomixdf 07:56, 18 May 2007 (UTC)


 * By "HMM" I assume you mean hidden Markov model. The article already asserts broad equivalence to Markov models and to finite subshifts; the section I wanted to add was not titled "equivalent concepts", it was titled "related concepts". Point taken, the whole suite of articles in this area is in poor condition. Perhaps I'll spend some time here. linas 14:50, 18 May 2007 (UTC)


 * The core observation is that one wants to integrate over all possible state transitions, or integrate over the cylinder sets of the system. What you get after the integration is a Bayesian network; the various conditional probabilities are the measures of the cylinder sets. If the measures are translation invariant, you get things like the Viterbi algorithm (an example of a dynamical Bayesian network); but they need not be translation invariant. linas 15:00, 18 May 2007 (UTC)

Again it sounds like the similarity is limited to HMMs (the Viterbi algorithm is an HMM specific algorithm, you talk about state transitions). Sure enough, all HMMs are BNs, but if there is no similarity with BNs _in general_, I still think the section will be more at home in the HMM article. Tomixdf 18:33, 20 May 2007 (UTC)

Dependencies vs Independencies
That bayesian networks model dependencies is a common misconception but this is not true. To see that it isn't read Pearl's Probabilistic Reasoning in Intelligent Systems, specifically the section on I-maps, D-maps, and P-maps. The issue is that graphs cannot encode BOTH dependence AND independence, so you have to choose which is more important. People have chosen independence as more important (not surprising, since it allows you to get an efficient representation of a distribution by factorization). To summarize: an absense of an edge between X and Y in a bayesian network IMPLIES there is a set of variables Z which render X and Y independent. The presense of an edge MAY mean there is a dependence or it may not, it does not IMPLY dependence. Dependence is only IMPLIED only when the so called 'faithfulness' or 'stability' assumptions hold (see Pearl's Causality book, chapter 2 for details).

Silya 17:54, 14 August 2007 (UTC)


 * This is very interesting but shouldn't some clarification in this be present in the article itself?


 * The meaning presented is counter-intuitive, if logically consistent. i.e. it is the sort of thing that a mathematician might understand but is baffling to the general reader that Wikipedia aims at.


 * I would suggest either explaining the use of this counter-intuitive approach in the article or coming up with terminology which is both intuitive and satisfying to a mathematician. For example, can we say that the edges represent potential dependencies?


 * Yaris678 10:41 6 June 2008 (UTC)


 * Just seen the new definition. Much better!  Yaris678 (talk) 19:49, 6 January 2009 (UTC)

Definitions and concepts: the equation
I did not understand the equation: Does P(X_i | parents(X_i)) mean the probability of X_i in case that all of its (direct) parents are true? So the graph is not a Bayesian Network if A has the parent B and X has the parent Y and, e.g., A, B, X, Y are all very unlikely but P(A|B) and P(X|Y) are big?

-- T —Preceding unsigned comment added by 193.229.70.253 (talk) 11:37, 6 November 2007 (UTC)


 * No. If the vector X is a random variable, and the vector x is a particular value for that random variable, then the equation is saying:


 * $$\mathrm \forall{x} P(X_1=x_1, \ldots, X_n=x_n) = \prod_{i=1}^n \mathrm P(X_i=x_i \mid X_j=x_j $$ for each $$X_j\,$$ which is a parent of $$ X_i)\,$$


 * So for Boolean variables (which you seem to be assuming), the probability function P(X_i | parents(X_i)) would be represented by a table of 2^n entries, if X_i has n parents. It would represent the probability of X_i being true in each case, for all of the 2^n possible combinations of its parents being true or false.  I think the original equation was more elegant than what I just wrote here.  But if you think what I just wrote is easier to understand, feel free to copy it back into the main article.  —Preceding unsigned comment added by Imagecreator (talk • contribs) 00:03, 17 December 2007 (UTC)


 * Elegance is not a criteria in Wikipedia. Clarity is. I'm pasting this into the article. CharlesGillingham (talk) 18:14, 15 September 2009 (UTC)

Validation methods?
i'm looking for ways to validate bayesian networks but can't find it here.. shouldn't there be a section about it? —Preceding unsigned comment added by 137.120.76.211 (talk) 10:55, 21 January 2008 (UTC)

Moved Bullet List from Intro to History Section
I think the points in the Intro bullet list are important to the full understanding of Bayes networks. But dealing with the subjectivity of the paradigm shouldn't be up in the intro. The concept is important. I agree with it. I'm writing a thesis about it. But its confusing to neophytes when placed in the fourth sentence. Also note this article is flagged as too technical. So I put it in History since it mentions history.

IMHO Some of these points should go up in the end of definitions and concepts. Further, it is unfortunate that History is way down at the bottom and very short. But the casual venue of a wiki article suggests intro first, examples second, formal definitions third, elaborations, histories, and controversies (like frequentism/belief debates, Dempster Shafer, limits, boundary conditions, causality vs inference, ...) fourth. Of course, feel free to revert my change. But I still think important elaborations should go among the elaborations. It should probably be first among those! --ClickStudent (talk) 19:32, 25 May 2008 (UTC)

The 'simple Bayesian network' figure for Rain/Sprinkler/Grass
As a not-good mathematician I feel a bit more explanation of how the conditional probabilities in the Rain and Sprinkler tables lead on to the probabilities in the Sprinkler/Rain/Wet Grass table.

Badmather (talk) 16:46, 22 July 2008 (UTC)

This drawing should be removed because it is a terrible example - not only is it cyclical, therefore contradicting the definition of a BN as a DAG, it does not correctly show the causal relationships between the nodes (how does the rain cause the sprinkler to turn on?) — Preceding unsigned comment added by 203.44.17.10 (talk) 00:51, 29 November 2017 (UTC)


 * It is NOT cyclical, because it is directed. If you start from a node and follow directed edges, u will never come back to the same node again. The rain causes the sprinkler to be turned on less frequently, than the absence of the rain. See the second graph with conditional probability tables (CPT). The graph structure only shows that there is a causal relationship between the nodes. The CPT explains that relationship. Hous21 (talk) 13:52, 29 November 2017 (UTC)

Request for detail on 'A simple Bayesian network' example
As a not-good mathematician I would like to request a bit more detail on how the numbers in the 'Sprinkler/Rain/Grass Wet' truth table are derived from the 'Rain' and 'Sprinkler' truth tables. Badmather (talk) 08:19, 23 July 2008 (UTC)

I am not familiar with Bayesian Networks. As I am trying to learn this, I tried to calculate the example using normal theory of probability. So:

If there is rain, the grass is wet. Or else, if there is no rain and the sprinkler is on, the grass is wet Otherwise, it's dry

These two events are disjoint, so there probabilities can be added to find Pr{G=T} = 0.2 + 0.8 * 0.4 = 0.52

The probability, that it rains and the grass is wet is just the contribution of the first event, i.e. Pr{R=T and G=T} = 0.2

So: Pr{R=T | G=T} = 0.2 / 0.52 = 5 / 13 = 38.46 %

Now my question: Why do you get 35.77 %, i.e. a solution different from classical theory of probability? --87.140.100.212 (talk) 10:16, 14 September 2020 (UTC)

Linkfarm
I removed the following, and left a link to a more complete list.


 * Free and open source software (alphabetical)


 * Ace, the Bayesian network compiler: http://reasoning.cs.ucla.edu/ace (Works with SamIam, creator of BANSY3)
 * AISpace, applet for belief and decision networks with explanation facilities designed for teaching and learning: http://aispace.org/bayes
 * BANJO: Bayesian Network Inference with Java Objects: http://www.cs.duke.edu/~amink/software/banjo
 * BANSY3 - Freeware. From the Non Linear Dynamics Laboratory. Mathematics Department, Science School, UNAM.
 * BN4R: http://bn4r.rubyforge.org/
 * BNT: Kevin Murphy's Bayesian Network Toolbox for MatLab: http://bnt.sourceforge.net/
 * dlib C++ Library: http://dclib.sourceforge.net/
 * GeNIe & SMILE: http://genie.sis.pitt.edu
 * JavaBayes, Bayesian Networks in Java: http://www.pmr.poli.usp.br/ltd/Software/javabayes/
 * OpenBayes: http://www.openbayes.org
 * pebl: Python Environment for Bayesian Learning: http://pebl-project.googlecode.com
 * ProBT-academic a free version for the academic community of the ProBAYES' industrial package
 * RISO: http://sourceforge.net/projects/riso/ (distributed belief networks)
 * SamIam: http://reasoning.cs.ucla.edu/samiam (Works with Ace, above)
 * UnBBayes: BN, ID, Multiply Sectioned Bayesian Network (MSBN) and Multi-Entity Bayesian Networks (MEBN). It also includes various algorithms for Bayesian Learning. From the Group of Artificial Intelligence at University of Brasília (UnB), Brazil.


 * Commercial software (alphabetical)


 * AgenaRisk Bayesian network tool: http://www.agenarisk.com
 * BayesBuilder: http://www.snn.ru.nl/nijmegen/index.php3?page=31
 * Bayesia: http://www.bayesia.com
 * Bayesian network application library: http://www.norsys.com/netlibrary/index.htm
 * BNet: http://www.cra.com/bnet
 * Causeway: http://www.inet.saic.com/
 * Dezide: http://www.dezide.com
 * dVelox: http://aparasw.com/dVelox
 * Hugin: http://www.hugin.com
 * MSBNx: a component-centric toolkit for modeling and inference with Bayesian Network (from Microsoft Research): http://research.microsoft.com/adapt/MSBNx/
 * Netica: http://www.norsys.com
 * Promedas (Bayesian medical decision support): http://www.promedas.nl
 * ProBayes: http://www.probayes.com
 * Quiddity: Esp. for large models.

--Adoniscik(t, c) 04:15, 11 September 2008 (UTC)

Random variables
I replaced this:
 * Nodes can represent any kind of variable, be it a measured parameter, a latent variable or a hypothesis. They are not restricted to representing random variables, which represents another "Bayesian" aspect of a Bayesian network.

with this:
 * Nodes represent random variables, but in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses.

The nodes must be random variables in the sense that they must have a probability distribution. They are Bayesian in the sense that latent variables, parameters etc. are random variables. -3mta3 (talk) 18:06, 27 April 2009 (UTC)

Cleanup
I think a cleanup needs to be performed of the references and software. We don't need to refer to every paper or book written on the topic. Likewise, a previous commenter had actually removed the list of software, but these have since been added back in.

Any nominations of what should be kept/removed? Should we make a List of Bayesian network software? —3mta3 (talk) 10:29, 2 May 2009 (UTC)


 * Actually, I would like to see a Comparison of Bayesian network software, but I wouldn't know where to start. Maghnus (talk) 04:53, 21 July 2010 (UTC)

Layman's terms required
In accordance with WP:JARGON: each of the definitions (Bayesian network, Bayesian network, etc) requires a second paragraph that explains the meaning of the symbols used, in a way that will make sense to (at least) computer science majors with a poor grasp of probability notation but a strong desire for precision. This article is linked to from the introductory article on artificial intelligence and should make sense to a student taking a course in AI. Ideally, it should make sense to an intelligent and patient journalist researching modern AI, who has no background in mathematical notation at all. As it is, it only makes sense to someone who is likely to know this definition already. CharlesGillingham (talk) 17:59, 15 September 2009 (UTC)


 * Yes lacking definitions: guessing at what "For all the following, let G = (V,E) be a directed acyclic graph (or DAG)," means I would have said "For all the following, let G be a directed acyclic graph (or DAG), V a vertice(node)??? and E an edge linking two vertices. Then G = (V,E) ..., but really this should be in set notation - if in fact this is what that sentence means.  —Preceding unsigned comment added by Thomas Kist (talk • contribs) 20:59, 30 September 2009 (UTC)


 * Agree. But it is not just the jargon, it is the way it is way maths articles are written Wikipedia generally.  Very formal and abstract, with little material to build intuitions.  I made other notes below..Tuntable (talk) 04:31, 27 December 2012 (UTC)

Factorization Definition? -No, not a good one!
A good definition should say directly and clearly "what it is". I found that the factorization definition is very indirect and cumbersome. This definition suggests that a user has to first do a bunch of calculations to judge whether a graph represents a Bayesian Network. —Preceding unsigned comment added by 67.171.96.2 (talk) 11:02, 4 April 2010 (UTC)


 * A mathematical definition is a list of calculations that precisely determine the category of a mathematical object. I think you are looking for a description, not a definition. That is covered in the article's second paragraph.


 * Personally, I like the factorization definition, because it is saying that "a Bayesian is a structure that allows one to calculate unknown probabilities efficiently from conditional probabilities." Although a definition is only required to describe a set of of mathematical objects accurately, this definition also explains why Bayesian nets are useful. CharlesGillingham (talk) 21:34, 17 April 2010 (UTC)

applications - should add list of notable projects using bn
Notable Projects incorporating Bayesian networks:
 * Medical Diagnosis
 * CPSP- liver and bile disease
 * Pathfinder- Heckerman- lymph-node diseases
 * qmr-dt
 * munin — Preceding unsigned comment added by Osnetwork (talk • contribs) 10:18, 24 March 2012 (UTC)

What's the meaning by "As expected, the likelihood of rain is unaffected by the action(Grass wet)"? Do you mean that if we see the world wet, we won't know if it has rained at all? — Preceding unsigned comment added by 202.113.225.184 (talk) 02:27, 16 June 2012 (UTC)

circumstances for conditional independence
This seems wrong: Edges represent conditional dependencies; nodes which are not connected represent variables which are conditionally independent of each other. The rules of d-separation and independence are more complicated than that. ★NealMcB★ (talk) 05:37, 14 November 2012 (UTC)

In addition, edges do not necessarily denote conditional dependence. There is still a slight problem with this sentence in the introduction. Edges do not necessarily denote conditional dependence, or any form of dependence. For example, as pointed out by David Barker, it is still possible to specify a distribution between two variables such that a->b, but p(b|a)=p(b). — Preceding unsigned comment added by Dopeytaylor (talk • contribs) 00:29, 22 July 2013 (UTC)

Unintelligible to non-experts
Like most maths articles on Wikipedia, this one is unintelligible to someone that does not already know the material.

Consider an intelligent programmer with a basic knowledge of maths who wants to get an ideal of what all the talk about Baysian Networks is about, and whether they should read more. Well, they won't be any the wiser from this article. I have a PhD in Comp Sci (but limited knowledge of this field) and I can only follow the sprinkler example with effort, so don't tell me that it is intelligible.

The topic is technical, and we certainly do not want journalistic waffle. But an introductory example of showing how the joint probablilities for a simple two node example can be simplified would be a good start. Then maybe somthing about independent nodes which is not even mentioned. Etc.

The goal of Wikipedia is not to demonstrate how clever Wikipedians are.Tuntable (talk) 04:27, 27 December 2012 (UTC)

Same as of 2019. More high quality graphs please. Zezen (talk) 09:27, 16 November 2019 (UTC)

Example formulas
The formulas in the example are way way too long, they even cause considerable horizontal scrolling. We should leave out the complete derivation (which is pointless imho) or find another example / notation. — Preceding unsigned comment added by 143.177.88.196 (talk) 18:49, 22 February 2014 (UTC)

Hierarchical Bayesian Model
'Hierarchical Bayesian Model' leads here. While it's true that a HBM is a special case of a Bayesian Network, it is suffiently widely used that it should not be subsumed under the more general heading. Linear regression is a special case of the linear model, which in turn is a special case of the generalised linear model and we have Wikipedia articles for all three. I think we need a separate article for the Hierarchical Bayesian Model. Blaise (talk) 08:05, 21 March 2014 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 5 one external links on Bayesian network. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20110723084249/http://www-fis.iarc.fr/~martyn/software/jags/ to http://www-fis.iarc.fr/~martyn/software/jags/
 * Added archive https://web.archive.org/web/20070927153751/https://www.dcs.qmul.ac.uk/~norman/papers/Combining%20evidence%20in%20risk%20analysis%20using%20BNs.pdf to https://www.dcs.qmul.ac.uk/~norman/papers/Combining%20evidence%20in%20risk%20analysis%20using%20BNs.pdf
 * Added archive https://web.archive.org/web/20060719171558/http://research.microsoft.com:80/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06 to http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06
 * Added archive https://web.archive.org/web/20060719171558/http://research.microsoft.com:80/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06 to http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-95-06
 * Added archive https://web.archive.org/web/20090923200511/http://wiki.syncleus.com:80/index.php/DANN:Bayesian_Network to http://wiki.syncleus.com/index.php/DANN:Bayesian_Network

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 04:09, 29 October 2016 (UTC)

Indiscriminate list of "Applications" - removed
By my latest count this pointless listcruft contained 26 (!) examples and mentions with zero relevant context and additional topic-related details. Aside from being pointless and indiscriminate, this list was also misused to cite spam various journal articles by a relatively new author in this area. Information about noteworthy applications from established experts could be added, but such coverage should contain additional relevant context beyond a trivial list. GermanJoe (talk) 21:07, 6 August 2018 (UTC)

Distinction between "Bayesian network" and "Bayesian neural network"?
The page Bayesian neural network currently redirects here. From what I can tell, the networks discussed here in regards to Bayesian networks have little to do with neural networks. I'm none too familiar with Bayesian neural networks (hence looking it up on Wikipedia), but are Bayesian neural networks a variety of Bayesian networks (in which case it might be worth mentioning them explicitly in this article), or is the similarity in names coincidental (in which case it might be worth un-redirecting the Bayesian neural network article, and/or expanding on that distinction in-text)? -- 22:57, 31 January 2019 (UTC) — Preceding unsigned comment added by 129.59.122.107 (talk)

Bayesian network public interactive model repository
I would like to propose adding a link to an interactive Bayesian network model repository created and maintained as a public service by BayesFusion, LLC. I am a professor at the University of Pittsburgh but also affiliated with BayesFusion, so I would rather not edit the page directly. The repository contains almost 100 models from various sources (acknowledged in case of each model) that can be examined interactively using any modern web browser. Visitors to that page can enter evidence and view results of computation. Here is the repository address:

https://repo.bayesfusion.com/ — Preceding unsigned comment added by Druzdzel (talk • contribs) 09:35, 3 June 2019 (UTC)

I have not seen any comments/objections to this, so I have added an external link to the repository. — Preceding unsigned comment added by Druzdzel (talk • contribs)


 * I object. This seems to basically be a software demo, and Wikipedia isn't for promotion of BayesFusion's products. - 10:51, 17 June 2019 (UTC)


 * I strongly object to this objection. BayesFusion, LLC, is indeed a commercial enterprize.  However, the repository has been created as a public service to the Bayesian network community.  It has been announced recently as such on the UAI mailing list (the prime platform for communicating news and results in the area of Bayesian networks).  All model come from Bayesian network literature and are acknowledged in the screen comments.  Visitors to that page can observe nodes and see results calculated by the Bayesian network models, something that the Wikipedia page does not come close to providing.  This is a valuable public service.  Please ask around in the community rather than accusing of promotion.  — Preceding unsigned comment added by Druzdzel (talk • contribs) 11:27, 17 June 2019 (UTC)

Local Markov property
Descendant is not defined. Children only, or children + grandchildren + ...? Sigma^2 (talk) 19:38, 12 September 2021 (UTC)

Example
The example {Wet Grass implies Sprinkler (non-exclusive) or Rain} is wrong. Wet grass implies a set of possible causes and even if we broaden "Rain" to mean air-borne condensation followed by precipitation of water, and "Sprinkler" to mean any application of water (wet = surface wetted with water) by artificial means, that still leave several other possibilities including direct condensation of water on the grass (dew) and flooding to name two examples. In some areas, I'd guess, dew is by far the most common (hence intuitive) way grass becomes wet. 98.21.213.85 (talk) 12:30, 20 December 2023 (UTC)