Talk:Dempster–Shafer theory

Formal definition of BBA correct?
In the formal definition, it says that a BBA needs to fulfill $$\sum _{A \subseteq 2^X} m(A) = 1,$$ however, should that not be concerning only the elements, not the subsets of the power set? $$\sum _{A \in 2^X} m(A) = 1,$$

More info for Bayesians?
I came here as a Bayesian interested in why anyone would want to use non-Bayesian inference, but am still having a hard time finding this out. Could anyone add to the article a short summary of what the point of this theory really is for practitioners, and in what cases could/should it be used instead of "proper" probabilistic inference? — Preceding unsigned comment added by 146.90.159.163 (talk) 09:25, 18 August 2015 (UTC)

Examples need improving

 * (Heading inserted by yoyo (talk) 05:45, 10 November 2009 (UTC))

Example producing correct results in case of high conflict
I could not reproduce the result of probability (in the strict sense?) 1 for film Y. Since only singletons have masses, K = m_1(X) * m_2(Z) = 0.99^2. m_1,2 (Y) simply comes up to 1/(1-K) * m_1(Y) * m_2(Y) = 1/(1-0.99^2) * 0.01^2 = 0.005025126.

Did I seriously misunderstand the definitions and/or the example, or are the explanations incomplete? --Jwollbold (talk) 10:14, 11 September 2014 (UTC)

The problem

 * (Heading inserted by yoyo (talk) 05:45, 10 November 2009 (UTC))

Read in page: "Although these are rather bad examples..." Then these should be replaced by better examples. Mbcudmore (talk) 14:57, 12 November 2008 (UTC)

There are cases where it is better to model on something different than a probability space, but the examples given in the article are not. They don't even make sense: If all events are considered disjoint, so that p(red or blue) is meant to be the probability that a sensor gives the result "red or blue", then there is no need for a model of "belief" because one could just define the probabilities of all events to be the corresponding "beliefs" normalized by a constant (to get a "belief"-probability-space). If on the other hand p(red or blue) is the probability that the sensor gives a result "blue" or a result "red", then the example tables conflict with probability axioms p(A or B) >= max( p(A), p(B), meaning they can't represent a probability table.

The advantage of this theory is, that it allows the calculation of "belief values" of new events from known "belief values" of other events in other ways than the probability one. But if in the examples all events are considered separate (so they all add up to one), then the only difference to Probability Theory is that the "beliefs" don't add up to one ( which gives the same results multiplied by a constant -- very useless)

I wonder if whoever wrote this article, did understand probability theory or even what is the difference to Dempster-Shafer Theory?

Please!!! someone bring sense into this article!!!
 * OK, I'll try to clarify the discussion with headings, and with a more suitable scenario for some DST examples. yoyo (talk) 06:05, 10 November 2009 (UTC)

A suitable scenario for DST
If Dempster-Shafer theory allows a non-probabilistic calculus of belief values, then clearly the examples given in the article should reflect such a set of belief values that differs in some remarkable way from a set of probability values for the same events.

Suppose, for example, all the evidence implicating suspect A comes from witness X, a citizen of unquestioned veracity, whilst all the evidence implicating suspect B comes from witness Y, who is known to have cheated on her exams in college and to have lied to her parents about her whereabouts on the night of the murder. All else being equal wouldn't you be more likely to believe witness X than witness Y, and thus convict suspect A rather than suspect B?

Surely creating a detailed numerical example for such a scenario, that illustrates DST, shouldn't be too difficult?

yoyo (talk) 05:45, 10 November 2009 (UTC)

Wrestler / boxer example
Canb someone who understands this work through how it applies to the boxer / wrestler example? This is introduced as the type of problem where DST is useful, then never mentioned again.Stainless316 (talk) 11:45, 9 November 2010 (UTC)

Various problems

 * (Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

None of the example are handled well by DST, so we should say what belief constraints are
I want to make the case that the concept of "belief constraints" needs to be spelled out, since according to the article, DST is correct in that context and it (I claim) is not correct in most other natural contexts that a reader unfamiliar with DST (such as myself) might try to fit to it.

The case where the belief masses are only nonzero on singletons reduces to probabilities (at any rate, Bel = Pl for all elements in the power set). In this case, Dempster's rule of combination gives probabilities in the sense that the resulting weights are also only nonzero on singletons. And yet, Dempster's rule clearly does not give the right probability distribution on the underlying state given the evidence received. At least not in general. Proper fusion relies crucially on knowing how the pieces of evidence are related to each other, which is missing from the model. Even in the case where the evidence is conditionally independent on the underlying state we are estimating, the fused probability for a given state is missing a factor of the apriori probability of that state.

All of this is to point out that the most obvious way in which you might interpret DST as an extension of Bayesian reasoning is not an extension of Bayesian reasoning.

Furthermore, all of the examples that derive weights from a piece of input evidence do it in a way which is fairly inexplicable. For example, the the only way I can make heads or tails of the statement Bel(A) < P(A) < Pl(A) is that, in the case where evidence is resolvable as probabilities on states, you should have nonzero weights on just the singletons, and they should correspond to the probability that that is the current state given the evidence. So in the example with doctors, the fact that a doctor thinks there is a .99 probability of disease A and .01 of disease B shouldn't translates to weights of w({A}) = .99, w({B}) = .01, w({C}) = 0, but instead to something more like w({A}) = .985, w({B}) = .0095, w({C}) = .001, since the weights represent the probability of the disease given the evidence that the doctor has made a certain claim about his confidence, and NOT the doctor's confidence itself. This is the source of the so-called "unintuitive" results. There is simply a bad translation from data to weights.

Anyway, the point is, having read this article I am completely unequipped to say how I might apply DST in a real-life scenario. The examples given and the natural interpretation as probabilities don't work. Supposedly belief constraints do, so maybe the examples should involve belief constraints. — Preceding unsigned comment added by ElPoojmar (talk • contribs) 19:02, 12 July 2013 (UTC)


 * I also got only a vague understanding of the notion "belief constraint", for instance by the sentence in the first paragraph of Dempster's rule of combination: "Use of that rule in other situations than that of combining belief constraints has come under serious criticism, such as in case of fusing separate beliefs estimates from multiple sources that are to be integrated in a cumulative manner, and not as constraints." Could it be explained explicitly? (Currently I am checking out if I can use the Dempster-Shafer theory. In this case may be I could contribute a definition of belief constraints here.) --Jwollbold (talk) 09:54, 11 September 2014 (UTC)

Implementation in a GIS environment

 * (Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

Does anyone know if this stat has been implemented in a Arc9.+ GIS environemnt?
 * I'm not sure I even understand the question! At a guess, it's asking for an implementation of the Dempster-Shafer combination rule as a function ("stat", or statistic[al function]) in a version 9 or later of some GIS software called Arc.  If the original poster can clarify the question, maybe someone can answer it.
 * yoyo (talk) 12:33, 10 November 2009 (UTC)

A possible typo

 * ( Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

Is this a typo? "This means that it is possible that the cat is alive (should be dead??), up to 0.8, since the remaining probability mass of 0.3 is essentially "indeterminate," meaning that the cat could either be dead or alive."

Using standard terminology

 * (Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

I think that giving the vocabulary is important: - $$X$$ should be called the Frame of reference - $$m$$ should be called the Basic Belief Assignment.
 * If I understand this correctly, the original poster is suggesting that the article should use the term Frame of reference for the mathematical object $$X$$, and the term Basic Belief Assignment for the mathematical object $$m$$. Presumably these terms are in common use in some standard reference?  If so, anyone knowing such a reference could make the appropriate edit for Frame of reference and give the reference.  (The term Basic Belief Assignment is already used and defined in the Formalism section.)  Remember that help with editing is always available - just ask!
 * yoyo (talk) 12:44, 10 November 2009 (UTC)

Connection between belief, plausibility and probability is moot

 * (Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

The idea that belief and plausibility are respectively lower and upper bounds on some real, unknown probability is not unanimously accepted and should be presented as such.
 * Presumably the original poster meant to write "... and should not be presented as such" (my emphasis)? Are there references for explicit statements of contrary views?
 * yoyo (talk) 12:47, 10 November 2009 (UTC)

Mathematical properties of Dempster's combination operator

 * ( Heading inserted by yoyo (talk) 06:05, 10 November 2009 (UTC))

Dempster's rule is usually denoted with $$\oplus$$. It is worth saying that it is commutative, associative and admits vacuous beliefs as identity.

Dempster's rule not analogous to Bayes' rule

 * (Heading inserted by yoyo (talk) 05:28, 10 November 2009 (UTC) )

The rule of combination is NOT analagous to Bayes' rule. Halpern (2003) makes this clear by treating combination and updating of Belief functions separately. This page really does need tidying up. I shall have a go later when I get the chance. Incompetnce (talk) 14:46, 28 July 2009 (UTC)
 * The article does not claim they are analogous, but rather that Dempster's rule generalises Bayes' Theorem.  Does your reference invalidate that claim?
 * yoyo (talk) 05:28, 10 November 2009 (UTC)

It generalizes Bayes' Theorem much like adding continuous tracks (as from a military tank) to a commercial jet airliner would generalize a fleet, giving it new abilities that reduce to the standard commercial jet when removed, yet creating an abomination of nature in doing so.

NewGuy: What is your point? Could you transfer that to the mathematical problem at hand? Different papers I am reading currently use the formulation that Dempster-Shafer evidential reasoning generalizes Bayes. -

Documentation, etc.

 * (Heading inserted by yoyo (talk) 12:51, 10 November 2009 (UTC))

I am surprised to notice that the criticism of this theory is well documented (even if it all comes from the same author) as compared to the theory itself. Another point is the fact that this article is part of the law project... I don't know much about Dempster-Shafer theory but I don't see in which way this could be related with law matters ????? Aurelein (talk) 15:01, 6 December 2007 (UTC)

Connection to the law
Aurelein, to answer your second point, reread the opening sentence:
 * "The Dempster–Shafer theory (sometimes abbreviated to DST) is a mathematical theory of evidence."

And evidence is, of course, the subject matter of law; and ways of weighing evidence (or mathematically speaking, weighting evidence) are some of the most essential methods of legal systems. So a mathematical study of how various pieces of evidence can interact systematically is of direct interest to lawyers.

One obvious point of great practical significance that could be clarified by mathematical measurement is the concept of "reasonable doubt". To take an extreme example, if a legal system prescribes mandatory execution for felony murder, you'd want to be as sure as practically possible that a conviction for felony murder was warranted, or run the risk of killing an innocent person. yoyo (talk) 05:21, 10 November 2009 (UTC)

Response re: connection to the law.
I will just point out a mathematical theory of evidence is not of interest to lawyers, because of the way legal systems work as regards evidence. In essence, law itself is a formal system for evaluating evidence, and only rules within the context of the formal legal system are valid. As a proof, courts do not rule on the correctness of anything, just validity and admissibility, and then later legal guilt and innocence. These correspond to factual points, but are different. Davidmanheim (talk) 23:15, 26 June 2014 (UTC)

Notation Issues
The article uses two different symbols for the power set, $$\mathbb P(X) $$ and $$2^X$$. It would probably be best to clean this up and only use one, but I don't know which is preferred. Also the section that declares its very own examples to be bad examples should probably be fixed, but again I'm not sure what would be appropriate. Mickeyg13 (talk) 13:49, 31 March 2009 (UTC)


 * Also mathbb isn't really a common notation for powerset. It's usually typeset in mathcal —Preceding unsigned comment added by 67.169.125.147 (talk) 05:17, 2 June 2009 (UTC)


 * Fixed. All references to powerset now use $$2^X \,$$.  yoyo (talk) 19:19, 4 October 2009 (UTC)

Deducing mass from belief or plausibility
When I read the statement:
 * "It follows from the above that, for finite sets you need know only one of the three (mass, belief, or plausibility) to deduce the other two",

I scratched my head! To make it easier for others to perform this mathematical miracle, I have augmented the text of the "Formalism" section, by including the inverse function for deriving the belief mass m(A) from the belief values bel(B) for all B subsets of A. This also required a reference, which I found conveniently in the first unnumbered item in the "Reference" section; this item now bears reference number 3. yoyo (talk) 18:48, 4 October 2009 (UTC)

Written like a college lecture not an encyclopedia article
This article now has a heading "Why we need Dempster–Shafer theory", which begins, "It seems that the well known probability theory can effectively model uncertainty." This is in the style of a college lecture, not an encyclopedia article. In an article about a theory, headings should be about the theory itself, not why we need it. That we need this theory at all is a point of view. "It seems that" is a weasel expression. If the well known probability theory can effectively model uncertainty, the article should say so without "it seems that." If there is some doubt as to whether the well known probability theory can effectively model uncertainty, the article should say something like "Professor X says the well known probability theory can effectively model uncertainty, but professor Y has published several articles disputing this." "The well known probability theory" is problematic. "Well-known" should be hyphenated. Why not just say "probability theory can effectively model uncertainty"? The entire article should be reviewed with an eye to make it read like an encyclopedia article and not a college lecture. -Anomalocaris (talk) 18:22, 16 November 2010 (UTC)

Neutrality re: DS as a "Generalization of Bayesian Theory"
The tone of the claim that DS is a generalization of the Bayesian theory of subjective probability has originated from Shafer himself, and would appear to not be neutral insofar that the tone of the statement makes it sound as if DS theory can be used to solve many problems that Bayesian probability is incapable of addressing. This claim would certainly be debated by any Bayesian, who would almost certainly claim that the advocate of DS theory simply does not understand how to set the problem up correctly using a Bayesian interpretation, and that the DS advocate has simply become confused in his understanding of how to define classes, or hypotheses, in the Bayesian framework, how to deal with non-exhaustive and non-exclusive classes/hypotheses under the Bayesian framework, how to interpret posterior distributions properly, and so on. Contrary to popular DS theory advocates, the Bayesian framework naturally deals with conflicting data sources and inputs. -jp — Preceding unsigned comment added by 99.70.212.56 (talk) 06:37, 20 January 2012 (UTC)

confusing intro needs fixing
This entire paragraph needs a re-rewrite:
 * Dempster–Shafer theory assigns its masses to all of the non-empty subsets of the entities that compose a system.

What is an entity? What is a system? ??? I know what sets and subsets are, so the following pargraphs are pointless. But its unclear what an 'entity' is.
 * Suppose for example that a system has five members, that is to say five independent states, exactly one of which is actual. If the original set is called — so that  — then the set of all subsets — the power set — is called . Since you can express each possible subset as a binary vector (describing whether any particular member is present or not by writing a “1” or a “0” for that member's slot), it can be seen that there are 25 subsets possible (2|S| in general), ranging from the empty subset (0, 0, 0, 0, 0) to the “everything” subset (1, 1, 1, 1, 1).

Cut this section, its irrelevant. Lets assume the reader knows what a "subset" is. There is no need to badly explain a power set.


 * The empty subset represents a contradiction, which is not true in any state[clarification needed],

Huh??? whats a contradiction?
 * and is thus assigned a mass of zero; the remaining masses are normalized[clarification needed] so that their total is 1.

Huh? What's normalized? ??


 * The “everything” subset is often labelled “unknown” as it represents the state where all elements are present, in the sense that you cannot tell which is actual.[clarification needed]

Huh? what's unknown about it?

I think the core issue is that it's diving into math, badly, without explaining why a power set is needed, what is in the set from which the power set was constructed, or what the subsets of the power set represent. Lets get these out of the way first. linas (talk) 14:32, 3 April 2012 (UTC)

belief and plausibility
Thew very first sentence of this section is unclear, throwing off the reader for the rest:
 * Belief in a hypothesis is constituted by the sum of the masses of all sets enclosed by it (i.e. the sum of the masses of all subsets of the hypothesis)

In what sense is a hypothesis a set? Farther down, "the cat is dead" is used as an example hypothesis. How does one convert "the cat is dead" into a set?? Or um, ..enclosed by it ?? How does a hypothesis "enclose" a set? What is in the set? What are it's members? What significance do those members have, with regards to the hypothesis? Is there some easy or natural way to interpret the members of the set? linas (talk) 15:08, 3 April 2012 (UTC)

Unsuitable Cumulative Fusion Situation Comments Removed (Neutrality)
All of the neutral literature I've read on this subject simply states that Dempster-Shafer theory can produce counter intuitive results in certain circumstances, and that such examples have fueled a debate over whether DS is consistent or not, and fueled research into its foundations and so on. Such examples are then provided. This is how the article used to read. Neutral articles don't take sides with one side of the debate and claim that those who want to interpret these examples as counter intuitive are "mistaken/wrong" because they are applying DS theory incorrectly. This is what the statements about applying DS theory "to a cumulative fusion situation to which it is not suitable" seem to imply and have a biased tone.

Strangely, you can read it as being biased for either side; i.e.:

1. This is because these idiots don't understand DS theory and tried to apply it wrong.

or

2. This is because DS theory is limited in applicability and clearly doesn't apply in these scenarios where probability theory does apply.

In any case, even if it can somehow be interpreted as being neutral, it is poorly worded/written. No one besides an expert, and even many experts, have a clue what the definition of "a cumulative fusion situation where DS theory applies" is. Please clearly define this and then maybe we can interpret the comments to be neutral, make sense after all, and add it back with enlightenment rather than it causing massive confusion. — Preceding unsigned comment added by 149.97.32.36 (talk • contribs)


 * I had reverted your edits, because without edit summaries they were indistinguishable from vandalism. Try again.  Dicklyon (talk) 22:06, 27 November 2012 (UTC)


 * Sorry--will put edit summaries first next time. Thanks for keeping this page vandal-free!  — Preceding unsigned comment added by 149.97.32.36 (talk) 22:15, 27 November 2012 (UTC)