Talk:Conditional expectation

Errors
This article has several significant mathematical errors. In the 'Conditioning as factorization' section, X must take values in a field so that the integral makes sense at all. And X and Y do not need to be both U-valued. In fact, U doesn't have to be a field, and so the current definition is broken. I suggest making X real valued, leaving Y U-valued, and patching things from there. —Preceding unsigned comment added by 140.247.149.155 (talk) 14:19, 14 February 2009 (UTC)


 * I have tried to fix this error. --Bdmy (talk) 16:27, 14 February 2009 (UTC)


 * Looks good, thanks. —Preceding unsigned comment added by 140.247.142.164 (talk) 06:05, 16 February 2009 (UTC)


 * The section 'Conditioning as factorization' is still really problematic. Defining E(X|Y) as a mapping over U is wrong, it is a random variable on the probability space (Omega,F,P). This section should simply mention that there exists a function g over U such that E(X|Y) = g(Y), the latter being the composition of g with Y, thus defined over Omega. Note that this is currently redundant with the previous section. 132.169.4.223 (talk) 14:53, 28 June 2017 (UTC)


 * Yes. See "" below. Boris Tsirelson (talk) 18:54, 28 June 2017 (UTC)

I think, at the end of Section 'Conditioning relative to a subalgebra', $$ L^2_{\operatorname{P}}(X;M) \rightarrow L^2_{\operatorname{P}}(X;N)$$ should be $$ L^2_{\operatorname{P}}(\Omega;M) \rightarrow L^2_{\operatorname{P}}(\Omega;N)$$. Right? 80.98.239.192 (talk) 17:10, 2 November 2013 (UTC)
 * Right! Corrected. Thank you. Boris Tsirelson (talk) 20:56, 2 November 2013 (UTC)
 * Thank you. 80.98.239.192 (talk) 22:47, 2 November 2013 (UTC)

Does anybody agree to put the fact that conditional expectation is the best regression in the mean square error sense? --Memming (talk) 15:51, 20 September 2008 (UTC)
 * This has been done since 2008. AVM2019 (talk) 13:58, 23 May 2022 (UTC)

One of the worst
Um, this has to be one of the worst Wikipedia pages I've ever read. It is written like a probability theory lecture or a textbook, not an encyclopedia article. For example, "In order to handle the general case, we need more powerful mathematical machinery." has no place here. I've studied significant amounts of math and statistics, and it's somewhat hard for even me to read, I'd imagine it is completely inaccessible to a layperson. Compare, for example, to the article on conditional probability. And at the same time, it attempts to explain things like discrete random variable right there. A complete (ok, you can keep the first paragraph) rewrite is needed.Zalle 18:41, 26 March 2007 (UTC)


 * I agree; this page is extremely arcane, and could use a major rewrite. Cazort (talk) 17:59, 30 March 2008 (UTC)


 * Definitely it needs improvement, but the early section titled "special cases" shouls be easy to read for anyone who's had "significant amounts of math and statistics". But the topic of the article necessarily requires that at least some of what is said will not be accessible to the "layperson". Michael Hardy 19:21, 26 March 2007 (UTC)


 * And the first sentence really should be accessible to anyone (except those who don't know what conditional probability distributions are, and it's not appropriate to ask that this be made accessible to those people, since "conditional probability distribution" is quite naturally a prerequisite to this topic), and for some purposes, says most of what need to be said. Michael Hardy 19:24, 26 March 2007 (UTC)


 * Well, looking at your "revert" I think you've failed to comprehend the difference between a mathematics textbook and an encyclopedia. "It can be shown that" is nothing but silly mathematical jargon and does not "signify that there is more to this argument". As this is an encyclopedia, it is by definition true (at least in an ideal situation) that all the claims contained can be "shown" to be true, and it is obvious without stating it that such a proof exists. I'd argue and edit more, but apparently this is a pet project so I don't think it's really worth the bother. Have fun, good luck.88.112.25.211 21:49, 26 March 2007 (UTC)

You seem to misunderstand. Yes, it is obvious that such a proof exists, but it may not be obvious that there is more to the proof than what is given here. Michael Hardy 22:40, 26 March 2007 (UTC)

Reworking suggested
This page needs substantial reworking.
 * Motivation involving finite probability spaces
 * Intuitive generalizations
 * CLear statements of abstract framework general theorems and the general probability framework
 * Cond Exp as factorization (important for defining sufficient statististics)
 * References.

If nobody objects, I'll do it in the next few hours CSTAR 15:45, 9 May 2004 (UTC)

Go for it. Charles Matthews 15:58, 9 May 2004 (UTC)

I'm still working on it. But I'd like to get some stuff out so I can ponder more on this. If I've screwed things up, please tell me. CSTAR 21:31, 9 May 2004 (UTC)

This article is terrible. It's fairly well written for an article addressed to mathematicians who know a bit of measure theory and have a bit of intuition for probability. Therefore, it's terrible. Obviously the main ideas can be stated simply in a way that can be understood by someone who knows only as much probability as can be understood without knowing even calculus. Well, it's not as much of an Augean stable as some things on Wikipedia, so maybe I'll do something with it at some point. Michael Hardy 01:55, 8 Oct 2004 (UTC)

I'm perfectly willing to believe the article is terrible...but does your argument really establish that it's terrible? Too abstract yes, not enough intuition yes etc etc. Please be more specific about what you think should be done with it, whether the abstract stuff should be removed etc. I'd be somewhat unhappy if conditioning as projection were to be removed, since without this it is hard to talk about martingales etc., but hey I won't lose any sleep over it. But simply concluding in an abrupt non-sequitur that it's terrible isn't very helpful!CSTAR 02:19, 8 Oct 2004 (UTC)

I would not remove the abstract stuff, but I would attempt to make the article comprehensible to everyone who understands the basic definition of conditional probability, not just to mathematicians who know, e.g., the Radon-Nikodym theorem. Even the definition of E(X | Y) for continuous random variables X and Y can be clearly stated in such terms if you don't require examination of the sort of issues addressed only in measure theory. Mathematical rigor is important in its place, but so is communication. I'll return to this when I've got some time. Michael Hardy 20:29, 8 Oct 2004 (UTC)

I have to say - the 'intuitive' explanation always went right by me. Charles Matthews 21:15, 8 Oct 2004 (UTC)

OK, here's something from another Wikipedia article:


 * The conditional expected value E( X | Y ) is a random variable in its own right, whose value depends on the value of Y. Notice that the conditional expected value of X given the event  Y = y is a function of y (this is where adherence to the conventional rigidly case-sensitive notation of probability theory becomes important!).  If we write E( X | Y = y) = g(y) then the random variable E( X | Y ) is just g(Y). Similar comments apply to the conditional variance.

Charles, is that the intuitive explanation that went by you? Michael Hardy 00:01, 9 Oct 2004 (UTC)


 * Is the previous paragraph an example of a clear explanation? Is it too impolitic to say it doesn't seem to me to be hardly an improvement? I'm also curious as to what you would point to as being more of an Augean stable in wikipedia, though I do agree that there are many, many articles which I think fit this bill.CSTAR 00:28, 9 Oct 2004 (UTC)

Guys, I think everyone's ambitions here are compatible, at least. Charles Matthews 08:52, 9 Oct 2004 (UTC)

1_{...}
What does the 1 in the E(X 1_{...}) notation stand for?


 * 1_A is the indicator function of A. --CSTAR 17:38, 14 Apr 2005 (UTC)


 * ..., and, in the context of probability theory, 1A can be defined as a random variable that is equal to 1 if the event A occurs and is equal to 0 if the event A fails to occur. Michael Hardy 18:23, 14 Apr 2005 (UTC)


 * But what does E(X f) mean, f being a function? Does it mean the E of the function composition of X and f?  (http://www.stats.uwo.ca/courses/ss357a/handouts/cond-expec.pdf uses a completely different formula)

Split into sections?
Does anybody else feel as if the table of contents is too down in the article, and some more splitting in sections could be done at the top? I don't know what is a good way of splitting it myself. Oleg Alexandrov 20:07, 14 Apr 2005 (UTC)


 * Yeah I agree; but don't look at me for changes..CSTAR 20:44, 14 Apr 2005 (UTC)

Proofs
===Can somebody provide the profes for conditional expectation is contractions? or should the reference for the profes be provided? I can add a good profe for Jensen's Inequality later. —Preceding unsigned comment added by Pondyyuan (talk • contribs) 18:09, 31 December 2006


 * Please describe what statement you want to prove regarding contractions in more detail. Jmath666 18:32, 28 March 2007 (UTC)


 * Is not contraction an obvious application of Jensen's inequality for $$f(x)=|x|^s$$? (Unconditional) Jensen's inequality has already some proof on Jensen%27s_inequality. 22:59, 2 November 2013 (UTC) — Preceding unsigned comment added by 80.98.239.192 (talk)

Pretty good, proposal to make it even better
This article is pretty good, surely better than what I could find in any book I looked and I found it of great help when I needed to clarify this stuff. Compared with the treatment in the classical books, such as Feller, Varadhan, Levy,.. I found it really lucid. I ended up writing notes for myself and few others, which are hopefully even more lucid and some may find more satisfactory. Any comments welcome.

I plan on merging the notes with the article in future. For now, my original lives in LaTeX so any edits will be overwritten next time when I run the translator. Jmath666 22:44, 27 March 2007 (UTC)

Merge with Conditional distribution
There does not seem to be a need for separate Conditional distribution article, that concept should be defined here anyway. The current Conditional distribution article is elementary and incorrect anyway. This could be done separately or in conjunction with the proposal above. Jmath666 18:27, 28 March 2007 (UTC)

There is now draft of the merged page. The original still lives in LaTeX and is not ready for public editing, that's why it is in user space. Jmath666 15:50, 29 March 2007 (UTC)

Methink they are quite different objects, so they could stay on separate pages. Conditional distribution may be further developed.User:unregistered user
 * I agree, separate pages would be better, though conditional distribution does need a bit of development.GromXXVII 12:17, 1 November 2007 (UTC)


 * Aren't they two completely different things??? They are to me. I have some sort of understanding what conditional probability is, i.e. Pr(y|x), but I've almost no idea what a conditional pdf p(y|x) might mean, which is why I'm looking it up. I know I'm pig ignorant, but presumably so are 99.9% of Wikipedia's users -- or they wouldn't be consulting this fount of wisdom. Please remember the ordinary users. --84.9.83.26 (talk) 20:46, 14 December 2007 (UTC)

I agree, pages should not be merged. A conditional mean is just one part of the many aspects related to conditional distributions. A distribution is the starting point for all random variables. Then there are conditional distributions. For conditional distributions there are conditional means, variances etc. But I am not a statistician or mathematician either. 12 February 2008

I would rather not merge the pages too. I think it's clearer to have separate, shorter pages that link to each other appropriately. That's the point of WP: Wikify. Cazort (talk) 05:16, 14 February 2009 (UTC)


 * Presently there are quite a few articles that attempt to deal with conditional probability (and expectation), with varing levels of quality:


 * conditioning (probability)
 * conditional probability
 * conditional probability distribution
 * regular conditional probability
 * conditional expectation
 * etc. It seems quite arbitrary which contains what. I think (some of) these should be reorganized, unified, and probably combined and merged into fewer articles. Yes, conditional probability and conditional expectation are very closely related concepts, e.g. through $$\mathop{P}(A|...)=\mathop{E}(\mathbf{1}_A|...)$$. 80.98.239.192 (talk) 13:01, 3 November 2013 (UTC)

Composition
The diagram

X > &Omega;                     R   --Y--> U --E(X|Y)-->

you draw is not correct, because also E(X|Y): &Omega; --> R. It should look like:

X > &Omega;                     R   ---Y---> U g---> ---E(X|Y) = g(Y) -->

Sorry, wasn't logged in.Nijdam 11:51, 25 April 2007 (UTC)

properties
properties of conditional expectation should be list in the article. We can find them in any textbook. such as E(X|X)=?, tower rule, independent rule, ... Jackzhp (talk) 20:46, 11 March 2008 (UTC)

Relation to estimators
I would like to request someone adding material on using conditional expectation to "improve" estimators (like in the Rao-Blackwell theorem) to this page; I would add the material, except that I feel I don't understand it 100%. Cazort (talk) 17:58, 30 March 2008 (UTC)

Connecting formal definition back to common usage
Regarding recent edits by Eclecticos: The "formal definition" is now a mixture of the formal definition and some discussion (even before the "discussion" subsection). In addition, the discussion is not always correct. About the "dividing by zero" problem in this context, see also Conditioning (probability). Boris Tsirelson (talk) 11:21, 5 January 2010 (UTC)


 * The common notation P[X|Y&isin;S] also has to be formally defined. What you characterize as "discussion before the discussion subsection " was an attempt to do this.  However, as my log comments indicated, I realized that this attempt was incorrect and have removed it.  More discussion of the issue is immediately below. Eclecticos (talk) 03:21, 6 January 2010 (UTC)

The article doesn't explain how we get from the formal $$\scriptstyle \operatorname{E}[X|\mathcal{B}]$$ back to the common notation for conditional expectation as used in the introduction.

Last night I tried to fix that, adding this text to Conditional expectation:


 * Note that $$\scriptstyle \operatorname{E}[X|\mathcal{B}]$$ is simply the name of the conditional expectation function. Given this function, we can compute specific conditional expectations such as E[X|Y=y] &mdash; in general, we define
 * $$\operatorname{E}[X|Y\in S] = \int_{Y^{-1}(S)} \operatorname{E}[X|\mathcal{B}](\omega)\ \operatorname{d}\omega$$
 * provided that $$\scriptstyle Y^{-1}(S) \in \mathcal{B}$$.

Here I was trying to remove the measure of Y-1(S) from the integration, replacing dP(&omega;) with just d&omega;. However, I quickly realized that there was something wrong with that, and so have removed the second sentence. The problem is that I don't think we can integrate directly with respect to d&omega; -- we need some kind of measure for integration. And I don't think we yet have a probability measure on the set Y-1(S); we can't just renormalize the P measure over that set if its original measure under P is 0.

So in fact, how do we define E[X|Y&isin;S]? I am tempted to say that we really should define conditional probability as a ratio, and handle the 0-denominator case by taking a limit. In fact, the Jaynes book quoted in the Borel-Kolmogorov paradox article suggests that we do just that (the relevant pages are on Google Books), with different ways of taking the limit giving different answers. That is also what the conditional probability article says, and it is hinted at by Conditioning (probability) as well.

However, the introduction to the present conditional expectation article motivates the whole conditional expectation trick and formalism as a way of solving the division by 0 problem, presumably without any stinkin' limits. And the Borel-Kolmogorov paradox article also suggests that the resolution to the division by 0 paradox can be found here in this article. So I am confused. Do the introduction and Borel-Komogorov article make false promises?


 * Where does that article suggest this? It does not even link here now. 80.98.239.192 (talk) 21:31, 2 November 2013 (UTC)

If we want a limit-free solution to the paradox, I would really have expected a different one. Specifically, I would have expected that we would start from conditional probability and be required to define not just a single probability measure P over the measurable subsets S of the sample space &Omega;, but rather an appropriately consistent family of probability measures PS where each PS is a probability measure over the measurable subsets of S. In particular, we would have some freedom to define PS when P&Omega;(S) = 0.

Notice that that approach (see Rao (1993) citation below for details) starts from conditional probability. The article starts instead from conditional expectation (as Kolmogorov did). Here's my concern. The article gives us freedom in defining $$\scriptstyle \operatorname{E}[X|\mathcal{B}]$$ on sets of measure 0, but the article seems to allow us to exercise that freedom differently for $$\scriptstyle \operatorname{E}[X|\mathcal{B}]$$ and $$\scriptstyle \operatorname{E}[Z|\mathcal{B}]$$, say, even if Z=X/2. Can't that lead to inconsistencies?

Eclecticos (talk) 03:10, 6 January 2010 (UTC)


 * First of all, there is no hope to define P(A|B) in general, when P(B)=0. This sad fact is the point of the Borel-Kolmogorov paradox. And in particular, there is no hope to define $$E(Y|X\in A)$$ when $$P(X\in A)=0.$$ Boris Tsirelson (talk) 06:39, 6 January 2010 (UTC)


 * Of course p(A|B) can be defined by stipulation, subject to certain requirements. The "Formal Definition" section of the article appears to be stating these requirements.  It says that you can define "a conditional expectation" of a random variable X (not necessarily unique!), conditioned on the sets of "a sub-&sigma;-algebra $$\scriptstyle \mathcal B \subseteq \mathcal A$$".  Nothing in the article says that this algebra has to exclude sets of measure 0.  In fact, on the contrary, the conditional probability article points here with a suggestion that this article will provide a way of defining conditional probabilities that condition on sets of measure 0.


 * My question about this approach is given above: if I want to define two conditional expectations $$\scriptstyle \operatorname{E}[X|\mathcal{B}]$$ and $$\scriptstyle \operatorname{E}[Z|\mathcal{B}]$$, shouldn't I be forced to choose both at once in a way that makes them consistent with each other? If Z=X/2 (this is meaningful since Z and X are functions on the probability space), then I am bothered that the article does not force $$\operatorname{E}[Z|B] \neq \operatorname{E}[X|B]/2$$ for all $$B \in \mathcal{B}$$ (in particular, it does not for B of measure 0).  Is this a flaw in Kolmogorov's axiomatization, or an error in the article?


 * (Could someone try to answer the immediately preceding question, which is still bothering me?) Eclecticos (talk) 06:23, 23 August 2010 (UTC)


 * The problem is certainly solvable, as I wrote above ("If we want a limit-free solution...").  Namely, one can replace Kolmogorov's axiomiatization of probability measures with an axiomatization of conditional probability measures.  In this case, all the conditional probabilities (and conditional expectations!) are defined directly and consistently by the conditional measure from the start.  The authoritative book on this approach, I believe, is Rao (1993), "Conditional Measures and Applications," who attributes the basic ideas to Rényi (1955) although he develops them further.


 * To be clear, the approach is well known. For example, the undergraduate textbook Makinson (2008, pp. 165-166) says that "This approach is popular among philosophers." Eclecticos (talk) 04:07, 16 September 2011 (UTC)


 * By the way, Jaynes (2003) -- who founds probability theory on rather a different basis but (as he says in his preface) ends up agreeing that Kolmogorov's axioms are correct if not necessarily complete -- also takes conditional probability to be fundamental, unlike Kolmogorov. He complains in his Appendix A that "The Kolmogorov axioms make no reference to the notion of conditional probability; indeed, KSP finds this an awkward notion, really unwanted ... In contrast, we considered it obvious from the start that all probabilities referring to the real world are necessarily conditional on the information at hand."  (As noted in the Borel-Kolmogorov paradox article, Jaynes does have his own views on how to commonsensically choose a definition of the conditional probability in this case: "Whenever we have a probability density on one space, and we wish to generate from it one on a subspace of measure 0, the only safe procedure is to pass to an explicitly defined limit by a process like (15.55).")  Eclecticos (talk) 05:16, 30 June 2010 (UTC)


 * About the limiting procedure, its merits and demerits are discussed in Conditioning (probability), see especially "The limiting procedure" there. See also Talk:Regular conditional probability. Boris Tsirelson (talk) 06:39, 6 January 2010 (UTC)


 * See also Conditional probability (not quite well-done, though) and Talk:Conditional probability. Boris Tsirelson (talk) 06:44, 6 January 2010 (UTC)


 * I think the Rao (1993) axiomatization should be mentioned as an alternative in the above articles, no? Eclecticos (talk) 05:17, 30 June 2010 (UTC)


 * If you believe that conditional probability makes sense in all cases, then please consider the following two questions.
 * First. Let U be a random variable distributed uniformly on [0,1]. Find the conditional probability of U=0.4 given that U is a rational number.
 * Second. Let $$U_1,U_2,\dots$$ be independent random variables distributed uniformly on [0,1]. Find the conditional probability of $$U_{10} > 0.5$$ given that $$U_n\to1$$ as n tends to infinity.
 * Boris Tsirelson (talk) 10:28, 1 July 2010 (UTC)


 * Sure, these are natural questions about Rao's axiomatization (please look at it before answering, in section 4.2 of his book on Google Books). Let me focus on your first question since it's simpler.  The answer is that this conditional probability might be undefined.  It is defined only if the set of rationals $${\mathbb Q}$$ is an element of the class of conditions $${\mathcal B}$$.  (Not all sets are conditions, just as not all sets are measurable.)  So I have to ask you to make your question more precise: what do you mean by "distributed uniformly" when your distribution is to be specified as a two-place function $$P(X\mid Y)$$ satisfying Rao's conditions?  The class $${\mathcal B}$$ will be specified as part of your definition of the distribution.  Eclecticos (talk) 06:23, 23 August 2010 (UTC)


 * (For any reasonable definition, the answer to your question is indeed "undefined." Why?  There are many ways to extend the classical uniform distribution (along with the conditional distributions derived from it) to a Rao-style family of conditional distributions.  These extensions differ in the conditional probabilities $$P(X\mid Y)$$ that they assign where Y is a set of measure zero.  They could for example be chosen to reflect different limiting procedures, in the sense of the Borel-Kolmogorov paradox.  However, it is easy to see that any extension that allows you to condition on the countably infinite set $${\mathbb Q}$$ cannot possibly treat the elements of $${\mathbb Q}$$ symmetrically -- so neither of us would be very happy using the name "uniform distribution" for any such extension!  If we really want to allow $${\mathbb Q}$$ in the denominator, we could perhaps get away with it by forbidding singleton sets in the numerator (in other words, choose a smaller sigma-algebra of measurable sets: I suspect this can be made to work out for the sigma-algebra generated by non-null intervals).  But then this is no longer an extension of the classical uniform distribution.  Either way -- regardless of whether we ban {0.4} from the numerator or $${\mathbb Q}$$ from the denominator -- the answer to your question would be that $$P(U=0.4 \mid U \in {\mathbb Q})$$ is not defined.) Eclecticos (talk) 06:23, 23 August 2010 (UTC)


 * "please look at it before answering" — no, sorry, I know it is not very polite, but I did not. Your answer convince me that the Rao-style theory is not a really important progress in the theory of conditioning, and so, not worth of my time. It is probably a reformulation of what I know in a different language. Of course, this is just my point of view; different people have different idea of "really important progress". Boris Tsirelson (talk) 11:24, 23 August 2010 (UTC)


 * Yes, the idea is to better formalize what you "know"! Your current (standard) formalization gives rise to the Borel-Kolmogorov paradox; whereas this one doesn't because the conditional probabilities are defined directly with no limiting procedure needed.  I brought up Rao not to claim his notability, but because he appears to solve an apparent problem with the formalization in the current wikipedia article.  I don't know yet whether you agree or disagree that it is a problem, but I gave a concrete example of it.  See above plea "Could someone try to answer ..."?


 * Rao's axiomatization is indeed obvious (and very short) and based on a 1955 proposal of Rényi, whom you surely respect. It permits you to condition on specified sets, including sets of measure zero.  Conditioning on sets of measure zero is necessary if you want to permit counterfactual reasoning, i.e., relativize to a world that you believe doesn't actually exist.  By defining an obvious notion of "conditional measure" and making you work with a single conditional measure, Rao ensures that all the conditional probabilities and conditional expectations within any world (including a world of measure zero) will be consistent with one another (in the sense of satisfying ordinary identities), so that the counterfactual reasoning is consistent (in the sense of logic).


 * cf.: Counterfactual_conditional 80.98.239.192 (talk) 21:31, 2 November 2013 (UTC)


 * My concern stated earlier is that the framework for conditional expectation in the current wikipedia article does allow conditioning on sets of measure zero, but does not require any consistency (not even among conditional expectations with the same condition!). The limiting procedures discussed in other wikipedia articles have a similar problem: defining many conditional probabilities requires choosing many limiting procedures, and if those choices are free and independent, what ensures that the resulting conditional probabilities are consistent?  Eclecticos (talk) 12:50, 25 August 2010 (UTC)


 * Eclecticos: About inconsistency in case of Z=X/2: As I see, the situation is that $$\operatorname{E}[X|\mathcal{B}]$$ is defined as an (equivalence) class of $$\mathcal{B}$$-measureable functions (random variables), where the equivalence relation is that two function agree outside a 0-measeure set. The same for $$ \operatorname{E}[Z|\mathcal{B}]$$. Any representative of $$ \operatorname{E}[X|\mathcal{B}]$$ divided by 2 and any representative of $$\operatorname{E}[Z|\mathcal{B}]$$ will equal outside a 0-measeure set. Moreover, the class $$\operatorname{E}[X|\mathcal{B}]$$ consists of just the functions which are 2 times the functions in the class $$\operatorname{E}[Z|\mathcal{B}]$$. In this sense, $$\operatorname{E}[Z|\mathcal{B}] = \operatorname{E}[X|\mathcal{B}]/2$$. But it is not determined which functions we have to choose from the classes, and that the two chosen such function should equal for every ω. Only almost surely.


 * You write more times, that this article allows conditioning on 0-measure sets, e.g., defining $$\operatorname{E}[X|B]$$. On the contrary! This article does not mention $$\operatorname{E}[X|B]$$ for 0-measure B. (Though other articles, like Conditioning_(probability) does allow it for special distributions with a joint density.) It cannot be done for the general case in the Kolmogorov setting, as you also referred to the Borel-Kolmogorov paradox. (By the way, that article does not suggests that the resolution to the division by 0 paradox can be found in this article, as you write.) This one defines only the class of functions $$\operatorname{E}[X|\mathcal{B}]$$, where $$\mathcal{B}$$ is a σ-algebra.


 * Also I think this consistency has something todo with the regularity in Section 'Definition of conditional probability', which ensures that, e.g., for disjoint A,C ⊆ Ω and for all fixed ω∈Ω, the represatatives are chosen such that the linearity $$\operatorname{E}(\mathbf{1}_A + \mathbf{1}_C|\mathcal{B})(\omega) = \operatorname{E}(\mathbf{1}_A|\mathcal{B})(\omega) + \operatorname{E}(\mathbf{1}_C|\mathcal{B})(\omega)$$ holds.


 * Anyway, this article is supposed to be on the standard Kolmogorov axiomatization. There can be another article on conditional probability under Rényi-Rao axiomatization which is referenced from here as an alternative (as you wrote), but this one probably should not be extended to this not-(so)-standard direction. 80.98.239.192 (talk) 21:31, 2 November 2013 (UTC)

I think, Section 'Conditioning relative to a subalgebra' has basicly almost the same content as Section 'Formal definition', only with different terminology, e.g. M, N and B instead of $$\mathcal{F}, \mathcal{H}$$, and H, and a precise theorem instead of Discussion. So I do not see what "This version is preferred by probabilists" refers to. These two should be merged probably. I prefer having mathcal letter, but also the formal Theorem.

I do not understand why the notation for the basic σ-algebra and its sub-σ-algebra should change throughout the article. 80.98.239.192 (talk) 21:31, 2 November 2013 (UTC)

relation with Conditional probability
Since conditional probability is a special case of conditional expectation, it will be good to treat the conditional probability in this article, rather than an independent article. Jackzhp (talk) 19:56, 12 March 2011 (UTC)

Different Conditioning
We have defined condition expectation on a sigma field and on a (measurable) set, the latter one is very messy. Conditioning on a random variable has not been defined yet.

As for conditioning on a set, let's show P(A|B)=P(AB)/P(B)) satisfies the conditional expectation definition. Here, the sigma field is $$\left\{ \emptyset,\Omega,B,\bar{B}\right\}$$, so we have to show 4 equalities. Can someone please elaborate this? Jackzhp (talk) 16:10, 12 March 2011 (UTC)

Formal definition
Is the formal definition given actually correct? As stated, it says that that $$E[X|\Beta] = X$$ a.e. I'd imagine you should explicitly state what the who probability measures are, and clarify that E is a Radon–Nikodym derivative times the random variable. —Preceding unsigned comment added by 128.176.122.34 (talk) 11:52, 8 May 2011 (UTC)

I would also add perhaps: the formal definition of conditional expectation w.r.t a $$\sigma$$-algebra doesn't mention the $$\sigma$$-algebra defined on the random variable's output. If we assume that the random variable's $$\sigma$$-algebra is the Borel one, then the conditional expectation is unique (right?), and results like $$E(X | H) = X$$ (when X is H-measurable) make sense. However, if we say nothing about the output's $$\sigma$$-algebra, then the conditional expectation can have other values and results like the above aren't necessarily true. So perhaps it would be good to mention this (or mention that texts on stochastic processes and expectation often assume that the random variable is defined with a Borel $$\sigma$$-algebra. See: https://stats.stackexchange.com/questions/495562/how-to-understand-conditional-expectation-w-r-t-sigma-algebra-is-the-conditiona/495667#495667

Error
Regretfully, an incorrect paragraph was introduced by User:3mta3 on 14:55, 14 March 2009, and is unnoticed during almost 6 years. In the end of Section "Calculation" we see:
 * $$ \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) = \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y), $$
 * and although this is trivial for individual values of y (since both sides are zero), it should hold for any measurable subset B of the domain of Y that:
 * $$ \int_B \operatorname{E} (X | Y=y) \operatorname{P}(Y=y) \ \operatorname{d}y = \int_B \sum_{x \in \mathcal{X}} x \ \operatorname{P}(X=x,Y=y) \ \operatorname{d}y. $$

This is a nonsense. Just "Since both sides are zero" in the former, the integrands are zero in the latter. Thus, we still have 0=0. The author of this paragraph believes naively, that integration is able to gather a continuum of zeros into a positive number. I understand his intuition, but no, it does not work this way. This formulation could be acceptable in Wikipedia of 18-th century, but not now. :-) Boris Tsirelson (talk) 12:13, 15 January 2015 (UTC)

Definition of conditional probability
Should it not be [0,1] instead of (0,1)? After all, A's probability could be zero or one. — Preceding unsigned comment added by Doubaer (talk • contribs) 08:11, 3 February 2015 (UTC)
 * Really, both are a nonsense. The indicator is a random variable. That is, measurable w.r.t. the sigma-algebra given on the relevant probability space. Now fixed. Boris Tsirelson (talk) 12:02, 3 February 2015 (UTC)

Integrability forgotten
Several times, conditional expectation is discussed without assuming integrability of the given random variable (while in fact it is essential). Boris Tsirelson (talk) 16:12, 30 December 2015 (UTC)

Agreed. Conditional expectation is defined on integrable random variables. A reference would be Patrick Billingsley's Probability and Measure. --Han (talk) 01:31, 8 January 2016 (UTC)

section 4.2 error?
In section 4 it is stated
 * $$\operatorname{E}(X\mid Y) = \operatorname{E}(X\mid\mathcal{H}) \circ Y $$,

while $$Y:\Omega \to U$$ and $$\operatorname{E}(X\mid\mathcal{H}) : \Omega \to \mathbb{R}^n$$. It seems to me that it should be
 * $$\operatorname{E}(X\mid Y) = \operatorname{E}(X\mid\mathcal{H}) \circ Y^{-1} $$

instead. — Preceding unsigned comment added by 178.37.84.106 (talk) 15:03, 22 June 2017 (UTC)


 * A mess indeed.
 * "Then the random variable $$g(Y)$$, denoted as $$\operatorname{E}(X\mid Y)$$, is a conditional expectation of X given $$Y$$."
 * So, what is this conditional expectation, is it the function $$g$$ or the random variable $$g(Y)$$? If it is $$g(Y)$$, then it is just $$\operatorname{E}(X\mid\mathcal{H})$$ (and not $$ \operatorname{E}(X\mid\mathcal{H}) \circ Y $$ nor $$\operatorname{E}(X\mid\mathcal{H}) \circ Y^{-1} $$). And then, Sections 4.1 and 4.2 deal with the same object, in slightly different notations. Or alternatively, if that conditional expectation is the function $$g,$$ then it is a (formally) different notion, and $$ \operatorname{E}(X\mid\mathcal{H}) = \operatorname{E}(X\mid Y) \circ Y = g \circ Y = g(Y). $$ And of course, all that should be understood up to equivalence (the equivalence being equality almost everywhere). Boris Tsirelson (talk) 19:00, 22 June 2017 (UTC)
 * A third option: define $$\operatorname{E}(X\mid Y=y)$$ as $$g(y)$$ for $$y\in U,$$ then $$ g = \operatorname{E}(X\mid Y=\cdot) = (y\mapsto\operatorname{E}(X\mid Y=y))$$ and $$ \operatorname{E}(X\mid\mathcal{H}) = \operatorname{E}(X\mid Y=\cdot) \circ Y = \operatorname{E}(X\mid Y=Y); $$ this "Y=Y" looks rather ridiculous, but really, the first Y denotes the random variable "as whole", while the second Y means the function $$Y(\cdot)$$; rather clumsy... Boris Tsirelson (talk) 19:39, 22 June 2017 (UTC)

"The definition of $$\operatorname{E}(X \mid \mathcal{H})$$ may resemble that of $$\operatorname{E}(X \mid H)$$ for an event $$H$$ but these are very different objects. The former is a $$\mathcal{H}$$-measurable function $$\Omega \to \mathbb{R}^n$$, while the latter is an element of $$\mathbb{R}^n$$.  Evaluating the former at $$H$$ yields the latter."

However, $$\operatorname{E}(X \mid \mathcal{H})$$ is a function $$\Omega \to \mathbb{R}^n$$ (which is explicitely stated in this paragraph); how can one evaluate this function at an arbitrary event $$H\in\mathcal{H}$$, when $$\mathcal{H}\neq \Omega$$? — Preceding unsigned comment added by 84.114.30.251 (talk) 21:10, 16 December 2018 (UTC)


 * You are right. If the σ-algebra $$\mathcal{H}$$ is finite (which is elementary case), then it corresponds to a finite partition of $$\Omega$$ into parts of positive measure, and on every part $$H$$ the function $$\operatorname{E}(X \mid \mathcal{H})$$ is constant (almost sure), and this constant is indeed $$\operatorname{E}(X \mid H).$$ But generally that function is not constant on $$H,$$ and $$\operatorname{E}(X \mid H)$$ is rather its mean (average) value over $$H$$ according to the given probability measure, provided that $$H\in\mathcal{H}$$ and $$P(H)\ne0.$$ Boris Tsirelson (talk) 22:28, 16 December 2018 (UTC)


 * This inconsistency has been resolved. "Evaluating the former at $$H$$ yields the latter." means taking an integral of the former over $$H$$. AVM2019 (talk) 13:54, 23 May 2022 (UTC)