Talk:Expected value/Archive 1

WAY TOO COMPLEX AN EXPLANATION
Deleted text:

Similarly, in computer science, the expected value of X is defined as


 * $$\operatorname{\mathbb{E}}[X] = \sum_i iP(X = i)$$

where X is an algorithm with different, weighted subroutines, and i is a particular algorithm path.

Populus 17:52, 16 Aug 2003 (UTC)

Removed
In general expectation is what is considered the most likely to happen. A less advantageous result gives rise to the emotion of disappointment. If something happens that is not at all expected it is a surprise. See also anticipation.--Jerryseinfeld 01:02, 1 Jan 2005 (UTC)


 * Since this has been removed, why does Disappointment redirect here? Quandaryus 09:18, 24 Jan 2005 (UTC)


 * Fixed the redirect to point to expectation. Ben Cairns 03:46, 2 Feb 2005 (UTC).

American roulette wheel example
The Article reads .. An American roulette wheel has 38 equally likely outcomes. A winning bet placed on a single number pays 35-to-1 (this means that you are paid 35 times your bet and your bet is returned, so you get 36 times your bet). So considering all 38 possible outcomes, the expected value of the profit resulting from a $1 bet on a single number is:

E(X)=(-$1x37/38)+($35x1/38)=-$0.0526. (Your net is −$1 when you lose and $35 when you win.) Therefore one expects, on average, to lose over five cents for every dollar bet, and the expected value of a one dollar bet is $0.9473. In gambling or betting, a game or situation in which the expected value of the profit for the player is zero (no net gain nor loss) is commonly called a "fair game."


 * someone has changed the article. Suggest them to revert changes till such time as the arguments come to a conclusion Sanjiv swarup (talk) 02:28, 25 April 2008 (UTC)

Argument for change The roulette table example has a flaw - it compares apples to oranges. Either you use the "amount pushed across the table" in each term, or you need to use "net change".

In the "amount pushed across the table" case, I agree that the second term is $36 X 1/38. But, in all cases to play, you have to put $1 down FIRST (it just so happens you get your own dollar back if you win). Using that logic, the formula should be (-$1 X 38/38) + (+$36 X 1/38), which computes out to be about $0.0526.

In the "net change" scenario, I agree that the first term is -$1 X 37/38. But, since one dollar of the 36 you get was yours at the beginning of the spin, you only net $35 on a win. Thus the formula would be (-$1 X 37/38) + (+$35 X 1/38), which still yields about $0.0526. So, one should expect to lose over five cents for every dollar bet. - Anonymous


 * You are absolutely right - will you fix it or should I? PAR 9 July 2005 04:17 (UTC)

Argument for no-change Skand swarup (talk) 13:13, 24 April 2008 (UTC) If one uses the "amount pushed across the table" in each term, the second term is $35 X 1/38 because one puts $1 down first and gets $36. "Amount pushed across the table" = $(-1+36)=$35. In all cases to play, one has to put $1 down first but "amount pushed across the table" cannot be -$1 in all cases! You will win if you get your number. So, the first term is -$1 X 37/38. Therefore, the terms will remain the same in the "net change" and "amount pushed across the table" scenarios.

American roulette wheel example
The Article reads .. An American roulette wheel has 38 equally likely outcomes. A winning bet placed on a single number pays 35-to-1 (this means that you are paid 35 times your bet and your bet is returned, so you get 36 times your bet). So considering all 38 possible outcomes, the expected value of the profit resulting from a $1 bet on a single number is:

E(X)=(-$1x37/38)+($35x1/38)=-$0.0526. (Your net is −$1 when you lose and $35 when you win.) Therefore one expects, on average, to lose over five cents for every dollar bet, and the expected value of a one dollar bet is $0.9473. In gambling or betting, a game or situation in which the expected value of the profit for the player is zero (no net gain nor loss) is commonly called a "fair game."

''' Isn't there an error in the example ? One ''' Should it not be (-$1 X 37/38) + (+$36 X 1/38) and not (-$1 X 37/38) + (+$35 X 1/38), since you get your $1 back? - Anonymous


 * Answer: No, there is no error in the example.

By Skand swarup (talk) 05:10, 18 April 2008 (UTC) The second term should not be (+$36 X 1/38) because the profit = $35, and not $36. The $1 that one gets back is not a profit.

'''Opinion: Isn't there an error in the example ? Two ''' To my way of thinking, whenever you place a one-dollar bet, that dollar leaves your possession, so the odds of losing a dollar are 38/38. On the other hand, when you win, you receive $36, and the odds of that are 1/38. Unfree (talk) 21:34, 26 April 2008 (UTC)


 * Answer: No, there is no error in the example. Skand swarup (talk)

I agree that whenever one places a one-dolar bet, that dollar leaves your posession. But, the odds of LOSING a dollar is 37/38 because on one instance, you will get it make plus make a profit. If you say that the odds of LOSING a dollar are 38/38, you are saying that you are certain to lose one dollar whenever you place a bet. Why will anyone place a bet then? We are calculating expected profit and profit is $35 when you win and not $36.

changed
I changed the section "Nonnegative variables", that consisted in a representation formula for the expected values of nonnegative random variables, to a subsection called "representation", in which I write a formula for the general momentum of a random variable. Moreover, I removed (in this subsection) the distinction between continuous and discrete random variables, since the formula holds without distinction. gala.martin

Roman vs. blackboard bold
Is there a reason the article switches from using $$\mathrm{E}X\,$$ to $$\mathbb{E}X$$ halfway through, or shall I change them all to roman E's for consistency? TheObtuseAngleOfDoom 21:19, 11 December 2005 (UTC)


 * No reason that I know of. PAR 22:30, 11 December 2005 (UTC)

No reason that I know. I would prefer to change all $$\mathrm{E}X\,$$ to $$\mathbb{E}X$$ as usual in math literature. gala.martin


 * Be bold! It's better to have a single form in the article. --Mgreenbe 22:47, 11 December 2005 (UTC)
 * I would like EX rather than $$\mathbb{E}X$$ as the former is more bearable inline, where it does need to be a png picture, but rather plain text. Wonder what others prefer. Oleg Alexandrov (talk) 00:32, 12 December 2005 (UTC)

I've gone ahead and been bold, as suggested, switching them all to roman. I also switched the $$\mathbb P$$'s to roman as well. TheObtuseAngleOfDoom 14:53, 12 December 2005 (UTC)
 * Thanks! Oleg Alexandrov (talk) 17:48, 12 December 2005 (UTC)

"Fair game" - Expected Value = 0?
I've always thought that a "fair game" is one in which the expected value is 0 - over many repetitions the player stands to neither gain nor lose anything. I don't quite understand the "half stake" that's in their right now (end of intro paragraph). I'm planning on changing it back to the definition that I had put down, but maybe it's just something that I don't know about expected values so I wanted to make sure. -Tejastheory 17:58, 26 December 2005 (UTC)


 * Yes, the "stake" additions are wrong. The previous wording was not wonderful either, though.  In a simple 2-person game, both players pay a "stake" into a pool, then one of them wins the pool.  If the game is fair, then the expected income is half the total stake (not half of one player's stake as it says now).  That "half" is only for 2-player games. The expected profit (income minus expenditure) is 0, which is true for fair games with any number of players.  We should describe it in terms of profit, without using gambling words like "stake", as that is more general and easier to understand. --Zero 22:41, 26 December 2005 (UTC)

"Properties the Expected Values has not
We cite some properties the expected values 'has not' (functional non-invariance and non-multiplicativity). It is not significative to write the properties a mathematical object has not. Otherwise, we should write too many... I think it would be better to remove these properties, or to move them at the bottom of the list of properties. This concerns in particoular the "functional non-invariance".

gala.martin

I changed the order of the list of properties, as explained above. Gala.martin 18:36, 28 January 2006 (UTC)

Question over notion of "fair game"
The article strikes me as okay - except for end of the 2nd paragraph that goes: In gambling or betting, a game or situation in which the expected value for the player is zero (no net gain nor loss) is called a "fair game."

While this seems to be convention (I have several references stating similar) the notion is false.

To determine if a game is fair, the probability of events and the odds offered are insufficent. You also need to consider the betting strategy used.

This can easily be seen in something I call the "fair bet paradox":

THE FAIR BET PARADOX: Imagine Alice and Bob start with $1000 each and both bet "heads" on an unbiased coin. A "fair bet", right? Well, let Alice bet just $1 per toss while Bob bets HALF HIS CURRENT FUNDS. Under this betting strategy, Alice's funds fluctuate around $1000 while Bob SWIFTLY GOES BROKE. True!

See the word doc "the fair bet paradox" downloadable from www.geocities.com/multigrals2000 for more info. The paradox is not a consequence of the gambler's fallacy or Bob's inital lack of adequate funds. You can offer Bob unlimited credit at 0% interest and he'd still go broke. Likewise if you raise the probability of "heads" to a bit above 0.6 (on which Alice would become rich). You can also solve for the betting strategy of betting a random fraction of your funds BUT THE GENERAL CASE SEEMS TO BE UNSOLVED (True?). Good Luck to anyone who solves it.

I'd like to edit the main page but don't feel confident to do so. If someone else does so, could you please leave the first two paragraphs as they are and perhaps add an explanatory bit below it (in brackets?) as I'd like to do something on the subject latter and would like to refer to the present material. Okay?

Yours, Daryl Williams (www.geocities.com/multigrals2000)


 * It's not true that Bob will go broke in this game; there is a non-zero probability that Bob will break _the House_ and win whatever ammount the House has (be it 1 million, 1 billion, or 1 googol). Unless Bob is betting against someone with infinite resources, like Cthulhu. Albmont 12:02, 9 March 2007 (UTC)

Or the House and bob have unlimited credit?

Under reasonable circumstances, that "non-zero probability" is usually very very small. With the above example (Alice and Bob start with $1000 each, Bob bets half his curent funds, etc) with 100 tosses, Bob needs 64 "heads" or more to break even or win (see note below).

The chance of this occurring is: [100!/64!36! + 100!/65!35! + ... +100!/100!0! ]*(1/2)^100 which = approx 1.9 * 10E-6 or just under 2 chances in a million.

So, under Bob's betting strategy, can the game honestly be considered "fair"? Just 2 Bobs out of 1 million on average breaking even or wining? Fair? Not something I'd accept as a bet. And if you use 1000 tosses or more, the odds get even worse.

THE COMMONLY HELD NOTION THAT 'FAIR ODDS' MEAN 'FAIR GAME' IS FALSE. What's needed is "fair odds" PLUS "appropriate betting strategy" = "fair game"

Many gamblers (and investors?) are robbing themselves even more than necessary due to adopting betting strategies like Bob above. I'd like to do something to perhaps reduce this (if possible)

Okay?

Anyone want to help in this endeavour? If so, contact me.

note 1: the number of heads Bob needs is from (3/2)^H * (1/2)^(100-H)  *  1000 > or = 1000 or     h > or = 100 * ln(2)/ln(3) = 63.0929 where H is the minimum number of heads needed)

Daryl Williams 03:24, 5 June 2007 (UTC)


 * So do you have an alternative suggestion on how to define a 'fair game'? iNic (talk) 00:38, 26 March 2008 (UTC)

E[f(X)]
Is there a general table giving E[f(X)] for varying functions f and with conditions for X? For example, I know that there is a closed formula for E[exp(X)] whenever X is normal, I could find it under log-normal distribution, but, if I didn't know it, I would be completely lost trying to find it. Albmont 10:02, 19 December 2006 (UTC)

Do you mean the law of the unconsious statstician(sp?)?. It says that for a distrubution X with f being its pdf $$E[g(X)]=\sum_{x\varepsilon X}g(x)\cdot P(X=x)$$

For continuous case $$E[g(X)]=\int_{x\varepsilon X}g(x)\cdot f(x) dx$$

Hyperbola 08:48, 12 September 2007 (UTC)

Assumption missing?
In the "Iterated expectation for discrete random variables" section, isn't the assumption


 * $$\left( \sum\limits_y \operatorname{P}(Y=y|X=x) \right) = 1\, $$

true only if X and Y are defined over the same probability space?

It says so in the article about the law of total expectation.

Helder Ribeiro 20:11, 2 January 2007 (UTC)


 * No. If you sum up the total probability of every event, you have to get 1. Something is going to happen. If the sum of all the events is only 0.9, then there is a 10% chance that Y takes no value at all? That doesn't make sense. Therefore, no matter what spaces things are defined on, the total probability summed over the whole space is always 1. I think that other article is in error. There has to be some joint probability distribution, but X and Y can take values in completely different spaces. - grubber 16:27, 9 March 2007 (UTC)


 * I don't think the question was about what space X and Y take values in, but which probability space they are defined on. The answer is yes, whenever you condition on a random variable or consider a joint probability distribution, the two (or more) random variables MUST be defined on the same probability space (or else the formulas are just nonsense). --67.193.128.233 (talk) 12:40, 10 March 2009 (UTC)

How is the expected value different from the arithmetic mean?
This page says that another term for "expected value" is "mean". I find that dubious - especially since the page mean says that the expected value is sometimes called the "population mean" - which I also find to be dubious. If the expected value is the same thing as a mean, then the pages should be merged. If not, this page should explain the difference. Fresheneesz 01:09, 15 February 2007 (UTC)


 * Expected value and mean are not the same thing. Means are defined on sets, for example the "arithmetic mean of a set of numbers". Expected values are used in stochastic settings, where you take the expected value of a random variable; there is some underlying probability distribution involved in expected values. I'm not familiar with "population mean", but I have a hard believing that that would be more than just a special case of expected value. You really do need a r.v. in order to take expected values. - grubber 16:20, 9 March 2007 (UTC)


 * If my understanding is correct, expected values are a mathematical concept - it's a function performed on a probability distribution. Means are a statistical concept - population mean being the mean of the entire population, and sample mean being an attempt to discover that population mean (or something approximating it). BC Graham (talk) 22:34, 25 March 2008 (UTC)


 * I understand that Expected Value is different from Mean. Most of the times, they have the same value but are completly different.  Expected Value is the most probable value and its identify by the main peak in the distribution curve.  Since Mean is the first moment of the distribution   M = Sum( X . p(x))/sum ( p(x)).  So, what is defined in this article is not realy the "Expected Value" but the "Mean"
 * The Expected Value and the Mean are the same only in a simetric central peak distribution, like the Gaussian one. ---200.45.253.38 (talk) —Preceding undated comment added 20:30, 6 March 2009 (UTC).

For two stochastic variables X and Y.
Discrete
 * $$E[XY]=\sum\limits_x \sum\limits_y xyf_{X,Y}(x,y) $$

Continuous
 * $$E[XY]=\int_{-\infty}^\infty \int_{-\infty}^\infty xyf_{X,Y}(x,y)\operatorname{d}y \operatorname{d}x $$

90.227.190.26 23:19, 5 April 2007 (UTC)

Dice a poor example
The average roll of a single die was the canonical example when I studied this too, but I feel it is a poor and misleading example. The expected value of 3.5 for a dice roll only makes sense if you are accumulating the sum of each dice roll--or if you are being paid $1 for a 1, $2 for a 2, etc. The pips on dice are usually interpreted as symbols, and adding the pips from consecutive rolls is rather unnatural and distracting. —Preceding unsigned comment added by Roberthoff82 (talk • contribs)


 * I disagree. I think it raises a very important characteristic of expected values at the very beginning of learning them, which is that they are not the intuitive "this value is what you would expect on any given event," but rather a statement about the distribution from which the value is being drawn. It is important to introduce that an expected value does not have to be a possible value.BC Graham (talk) 22:39, 25 March 2008 (UTC)


 * Clearly, "value" is being used in two senses: firstly, the "value," or number of dots, appearing on the die, and secondly, the "mean (arithmetic average) of the values" which are likely to appear in the long run. In one sense, the expected value is impossible, but in the other, not only is it quite possible, it's a mathematical certainty! Unfree (talk) 21:18, 26 April 2008 (UTC)

Subscripting
I guess it might be useful to subscript the $$\operatorname{E}$$ where necessary. Surely, this


 * $$\operatorname{E_Y} \left( \operatorname{E_X}(X|Y) \right)$$

is easier to read than this


 * $$\operatorname{E} \left( \operatorname{E}(X|Y) \right)$$ —Preceding unsigned comment added by 137.132.250.8 (talk) 13:57, 26 February 2008 (UTC)


 * What does the vertical line (bar, pipe) stand for? Unfree (talk) 21:46, 26 April 2008 (UTC)


 * It is conditional expectation. --MarSch (talk) 11:38, 27 April 2008 (UTC)

Roman vs. Italic
Although notation of course vary, most textbooks and papers, at least in statistics, seem to use italics for the expected value symbol, that is, $$E(X)$$ rather than $$\operatorname{E}(X)$$. The rule I have learned is to use italics for all single-letter mathematical symbols (including operators) and roman for symbols consisting of two or more letters (including variables and functions). But I guess practices differ between displines? Other wikipedia articles appear to use both forms. Jt68 (talk) 11:22, 8 July 2008 (UTC)


 * According to Manual_of_Style_(mathematics) both roman and italics are acceptable for things like $$\operatorname{d}x$$ and I guess also $$\operatorname{E}(x)$$ as long as it is used consistently within an article. --Jt68 (talk) 21:47, 12 July 2008 (UTC)


 * Where does the rule of writing symbols consisting of more than one letter in roman come from? I am seriously looking for good references on mathematical notation, since it is nothing less than a formal language with its own rules. And why do we see both E[X] and E(X)? I don't like very much the latter notation since E(X) is not a function of X but rather of the values taken by X. —Preceding unsigned comment added by Flavio Guitian (talk • contribs) 22:18, 4 December 2008 (UTC)

Value in expected value
I undid the changes by User:Jabowery as I don't see how the die example is an "example of what is not meant by value in expected value". While it may not be useful to compute the expected value of a dice roll this is not the point here; we are only trying to clearify what is meant by the mathematical definition of expected value. It is perfectly valid to compute the expected value of any random variable, the variable does not have to be a measure of utility. Jt68 (talk) 13:00, 11 July 2008 (UTC)

Proposition to add origin of the theory
I propose to add the following in a separate chapter:

Blaise Pascal was challenged by a friend, Antoine Gombaud (self acclaimed “Chevalier deM\´er\´e” and writer), with a gambling problem. The problem was that of two players who want to finish a game early and, given the current circumstances of the game, want to divide the stakes fairly, based on the chance each has of winning the game from that point. How should they find this “fair amount”? In 1654, he corresponded with Louis de Fermat on the subject of gambling. And it is in the discussion about this problem, that the foundations of the mathematical theory of probabilities are laid and the notion expected value introduced.

--->PLEASE let me know (right here or on my talk page) if this is would not be ok otherwise I plan to add it in a few days<---

Phdb (talk) 14:00, 24 February 2009 (UTC)


 * Yes please add. In fact, for quite some time the concept of expectation was more fundamental than the concept of probability in the theory. Fermat and Pacal never mentioned the word "probability" in their correspondence, for example. The probability concept then eventually emerged from the concept of expectation and replaced it as the fundamental concept of the theory. iNic (talk) 01:03, 26 February 2009 (UTC)


 * Laplace used the term “hope” or “mathematical hope” to denote the concept of expected value (see, ch.6):
 * … This advantage in the theory of chance is the product of the sum hoped for by the probability of obtaining it; it is the partial sum which ought to result when we do note wish to run the risks of the event in supposing that the division is made proportional to the probabilities.  This division is the only equitable one when all strange circumstances are eliminated; because an equal degree of probability gives an equal right for the sum hoped for.  We will call this advantage mathematical hope.
 * I wonder what name was used by Pascal (if any), and where did “expectation” came from?  …  st pasha  »  19:37, 24 September 2009 (UTC)

Strange


\operatorname{E}({\rm Roll\ With\ 6\ Sided\ Die}) = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5 $$

I not very familiar with statistics, and possibly this is why I can't see any sense in counting the numbers written on the sides of the die. What if the sides of the die would be assigned signs without defined alphanumerical order? What would be the "expected value"? --85.207.59.18 (talk) 12:14, 11 September 2009 (UTC)
 * In that case you can't really speak of a "value" of a certain side. If there's no value to a side, it's impossible to speak of an expectation value of throwing the die. The concept would be meaningless. Gabbe (talk) 15:52, 22 January 2010 (UTC)
 * Of course, you could assign values to the sides. The sides of a coin, for example, do not have numerical values. But if you assigned the value "-1" to heads and "1" to tails you would get the expected value

\operatorname{E}({\rm Flipping\ a\ coin}) = \frac{-1 + 1}{2} = 0 $$
 * Similarly, if you instead gave heads the value "0" and tails the value "1" you would get

\operatorname{E}({\rm Flipping\ a\ coin}) = \frac{0 + 1}{2} = \frac{1}{2} $$
 * and so forth. Gabbe (talk) 20:34, 22 January 2010 (UTC)

Upgrading the article
At present (Feb 2010) the article is rated as only "Start", yet supposedly has "Top" priority and is on the frequent-viewing lists. Some initial comments on where things fall short are:
 * The lead fails to say why "expected value" is important, either generally or regarding its importance as underlying statistical inference.
 * There is a lack of references, both at a general-reader level and for the more sophisticated stuff.
 * There is some duplication, which is not necessarily a problem, but it would be if there were cross-referencing of this.
 * There is a poor ordering of material, in terms of sophistication, with elementary level stuff interspersed.

I guess others will have other thoughts. Particularly for this topic, it should be a high priority to retain a good exposition that is accessible at an elementary level, for which there is good start already in the article. Melcombe (talk) 13:17, 25 February 2010 (UTC)

Proposition for alternative proof of $$\mathbb E(X) = \int_0^\infty \mathbb P(X>x) dx$$
I tried to add the proof below which I believe to be correct (except for a minor typo which is now changed). This was undone because "it does not work for certain heavy-tailed distribution such as Pareto (α < 1)". Can someone elaborate?

Alternative proof: Using integration by parts

\mathbb E(X) = \int_0^\infty (-x)(-f_X(x))\;dx = \left[ -x(1 - F(x)) \right]_0^\infty + \int_0^\infty (1 - F(x)\;dx $$ and the bracket vanishes because $$1-F(x) = o(1/x) $$ as $$x \to \infty$$. —Preceding unsigned comment added by 160.39.51.111 (talk) 02:50, 13 May 2011 (UTC)


 * Actually the end of section 1.4 seems in agreement, so I am reinstating my changes —Preceding unsigned comment added by 160.39.51.111 (talk) 02:54, 13 May 2011 (UTC)


 * I have removed it again. The "proof" is invalid as it explicitly relies on the assumption $$1-F(x) = o(1/x) $$ as $$x \to \infty$$ which does not hold for all cdfs (e.g. Pareto as said above). You might try reversing the argument and doing an integration by parts, starting with the "result", which might then be shown to be equavalent to the formula involving the density. PS, please sign your posts on talk pages. JA(000)Davidson (talk) 09:40, 13 May 2011 (UTC)


 * Let's try to sort this out: I claim that whenever X nonnegative has an expectation, then $$1 - F(x) = o(1/x)$$  (Pareto distribution when alpha < 1 doesn't even have an expectation, so this is not a valid counter-example)
 * Proof: Assuming X has density function f, we have for any $$ c> 0 $$

\mathbb E(X) = \int_0^\infty xf(x)dx \geq \int_0^c xf(x)dx + c\int_c^\infty f(x)dx $$
 * Recognizing $$\bar F(c) = \int_c^\infty f(x)dx $$ and rearranging terms:

0 \leq c\bar F(c) \leq \mathbb E(X) - \int_0^c x f(x)dx \to 0 \text{ as  } c \to \infty $$
 * as claimed.
 * Are we all in agreement, or am I missing something again? Phaedo1732 (talk) 19:05, 13 May 2011 (UTC)


 * Regardless of the validity of the proof, is an alternative proof a strong addition to the page? C RETOG 8(t/c) 19:12, 13 May 2011 (UTC)


 * I think so, because the current proof is more like a trick than a generic method, whereas the alternative proof could be generalized (as shown in Section 1.4) I also think the point of an encyclopedia is to give more information rather than less.  Phaedo1732 (talk) 00:49, 14 May 2011 (UTC)


 * See 6 of WP:NOTTEXTBOOK, and WP:MSM. This doesn't seem to be a place that needs a proof at all. What is needed is a proper citation for the result, and a proper statement of the result and its generalisation to other lower bounds. {I.e. the result could be used as an alternative definition of "expected value", but are the definitions entirely equivalent?)JA(000)Davidson (talk) 08:28, 16 May 2011 (UTC)


 * Clearly the previous editor of that section thought a proof should be given. If anyone comes up with a good citation, I am all for it. Phaedo1732 (talk) 15:31, 16 May 2011 (UTC)

"Expected value of a function" seems to be misplaced
Does the text starting with "The expected value of an arbitrary function of ..." really belong to the definition of the expectation, or would it be better to move it to Properties, between 3.6 and 3.7, and give it a new section (with which title?)? I am not entirely sure, but I think one can derive the expected value of a function of a random variable without the need for an explicit definition. After all, the function of a random variable is a random variable again; given that random variables are (measurable) functions themselves, it should be possible to construct $E(g(X))$ just from the general definition of $E$. Any thoughts? Grumpfel (talk) 21:54, 29 November 2011 (UTC)


 * I agree. Boris Tsirelson (talk) 06:33, 30 January 2012 (UTC)

Expectation of the number of positive events
If there is a probability p that a certain event will happen, and there are N such events, then the expectation of the number of events is $$pN$$, even when the events are dependent. I think this is a useful application of the sum-of-expectations formula. --Erel Segal (talk) 14:39, 6 May 2012 (UTC)

notation
In the section on iterated expectations and the law of total expectation, the lower-case x is used to refer to particular values of the random variable denoted by capital X, so that for example
 * $$ \sum_{x=2}^3 x \Pr(X=x) = 2\cdot\Pr(X=2) + 3\cdot\Pr(X=3). \, $$

Then I found notation that looks like this:
 * $$ \operatorname{E}_X(x) \, $$

Now what in the world would that be equal to in the case where x = 3?? It would be
 * $$ \operatorname{E}_X(3), \, $$

but what is that??? This notation makes no sense, and I got rid of it. Michael Hardy (talk) 21:33, 13 August 2012 (UTC)

Incorrect example?
Example 2 in the definition section doesn't take into account the $1 wager. — Preceding unsigned comment added by Gregchaz (talk • contribs) 22:23, 17 November 2012 (UTC)


 * Isn't it factored into the $35 payout? —C.Fred (talk) 22:25, 17 November 2012 (UTC)

Formulas for special cases - Non-negative discrete
In the example at the bottom, I think the sum should be from i=1, and equal (1/p)-1. For instance, if p=1, you get heads every time, so since they so carefully explained that this means that X=0, the sum should work out to 0; hence (1/p)-1 rather than (1/p).


 * Well, imagine that p = 1 but YOU don't know. Then you'll toss the coin, and of course it gives heads at the first try. Other way of explaining: suppose p = 0.9999, and you know it. But then you are not absolutely sure that you will get heads, and you have to toss, with a very high probability of success at the first try. Bdmy (talk) 07:27, 1 June 2013 (UTC)


 * The OP is correct -- Bdmy has missed the fact that getting heads on the first try with certainty or near certainty is, in the notation of the article's example, X=0 not X=1.. There are several ways to see this: (1) The formula derived above the example says that the sum goes from one to infinity, not zero to infinity. Or, consider (2) if p=1/2, the possible sequences are H (X=0 with probability 1/2); TH (X=1 with probability 1/4); TTH (X=2 with probability 1/8; etc. So the expected value is 0 times 1/2 plus 1 times 1/4 plus 2 times 1/8 plus ... = 0 + 1/4 + 2/8 + 3/16 + 4/32 + ....  = 1 = (1/p) -1.  Or, consider (3) the OP's example with p=1 is correct -- you will certainly get a success on the first try, so X=0 with certainty so E(X) = 0 = (1/p) - 1.  I'll correct it in the article.  Duoduoduo (talk) 18:37, 1 June 2013 (UTC)

Reason for deleting sub-section "Terminology"
I'm deleting the subsection "Terminology" of the section "Definition"for the following reasons. The sections reads


 * Terminology


 * When one speaks of the "expected price", "expected height", etc. one often means the expected value of a random variable that is a price, a height, etc. However, the "value" in expected value is more general than price or winnings. For example a game played to try to obtain the cost of a life saving operation would assign a high value where the winnings are above the required amount, but the value may be very low or zero for lesser amounts.
 * When one speaks of the "expected number of attempts needed to get one successful attempt", one might conservatively approximate it as the reciprocal of the probability of success for such an attempt. Cf. expected value of the geometric distribution.

The first sentence is a pointless tautology. The remainder of the first paragraph doesn't make a bit of sense, but maybe it is attempting to make the obvious point that sometimes "value" means a dollar value and sometimes not. This is obvious and not useful, even if it were well-expressed. The second paragraph doesn't have anything to do with either "Definition" or "Terminology", and in any event it is wrong (the "approximate" value it gives is actually exact. Duoduoduo (talk) 14:51, 2 June 2013 (UTC)

Multivariate formula
The following formula has been added for the expected value of a multivariate random variable:


 * $$   \operatorname{E}[X] = \int_{-\infty}^\infty\cdots \int_{-\infty}^\infty X(x_1,\cdots,x_n)~f(x_1,\cdots,x_n)~dx_1\cdots dx_n .  $$

First, I don't understand what calculation is called for by the formula. Why do we have a multiple integral? It seems to me that since the left side of the equation is an n-dimensional vector, the right side should also be an n-dimensional vector in which each element i has a single integral of $$x_i f(x_1,\dots,x_n)dx_i.$$

Second, I don't understand the subsequent sentence


 * Note that, for the univariate cases, the random variables X are taken as the identity functions over different sets of reals.

What different sets of reals? And in what way is X in the general case not based on the identity function -- is $$X(x_1,\cdots,x_n)$$ intended to mean something other than simply the vector $$(x_1,\cdots,x_n)$$ ? Duoduoduo (talk) 17:56, 12 September 2013 (UTC)


 * I agree, that is a mess. First, I guess that the formula
 * $$   \operatorname{E}[X] = \int_{-\infty}^\infty\cdots \int_{-\infty}^\infty (x_1,\cdots,x_n)~f(x_1,\cdots,x_n)~dx_1\cdots dx_n .  $$
 * was really meant. Second, I guess, the author of this text is one of these numerous people that believe that, dealing with an n-dim random vector, we should take the probability space equal to Rn, the probability measure equal to the distribution of the random vector, and yes, $$X(x_1,\cdots,x_n)=(x_1,\cdots,x_n)$$. (Probably because they have no other idea.) I am afraid that they can support this by some (more or less reliable) sources. Boris Tsirelson (talk) 18:17, 12 September 2013 (UTC)


 * Should it be reverted? Duoduoduo (talk) 19:40, 12 September 2013 (UTC)


 * Maybe. Or maybe partially deleted and partially reformulated? Boris Tsirelson (talk) 21:07, 12 September 2013 (UTC)


 * I'll leave it up to you -- you're more familiar with this material than I am. Duoduoduo (talk) 22:56, 12 September 2013 (UTC)


 * I wrote this formula. (I have a Ph.D. in Applied Mathematics, though I can admit that I am wrong if someone proves it) I make the link between the general form and the uni-variate form. This formula is what I meant. This link can help to legitimate the statement http://mathworld.wolfram.com/ExpectationValue.html . Note that if this line is not there it is hard to make the link between the general form and the uni-varite form.


 * The formula in Wolfram is OK; but why your one is different?
 * Before deciding whether or not your formula should be here we should decide whether or not it is correct.
 * In Wolfram one considers expectation of a function f of n random variables that have a joint density P. In contrast, you write "multivariate random variable $$X(x_1,\cdots,x_n)$$ admits a probability density function $$f(x_1,\cdots,x_n)$$". What could it mean? A (scalar) function of n random variables is a one-dimensional random variable, and its density (if exists) is a function of one variable. The random vector $$(x_1,\cdots,x_n)$$ is a multivariate random variable and its density (if exists) is a function of n variables. What could you mean by X? Boris Tsirelson (talk) 13:59, 2 October 2013 (UTC)


 * You say " In Wolfram one considers expectation of a function f of n random variables that have a joint density P ". I do not agree. Indeed, they don't say that the $$x_1,\cdots,x_n$$ are RANDOM variable. I would say that their definition is more prudent. More specifically, in this article, I consider in the definition of the multivariate case that the vector $$x_1,\cdots,x_n$$ belongs to the sample space whereas $$X(x_1,\cdots,x_n)$$ is a random variable which is actually a function of this variable in the sample space.
 * Notice that (for simplicity) in the case of univariate functions, the variables of your sample space are equal to the observed random variable. I.e. roll one dice you see 5 then the random variable return 5, this is simple.
 * Let us now consider a multivariate random variable: Choose one longitude, one latitude, and a date (these are the variables of the sample space). Let us now measure something e.g. atmospheric pressure or temperature or simply the sum of longitude and latitude (even if it does not make much sense.) these are multivariate random variables. You just observe numbers and build your statistic like you would do in the section general definition.

(Unindent) Ah, yes, this is what I was fearing of, see above where I wrote: "the author of this text is one of these numerous people that believe that, dealing with an n-dim random vector, we should take the probability space equal to Rn, the probability measure equal to the distribution of the random vector, and yes, $$X(x_1,\cdots,x_n)=(x_1,\cdots,x_n)$$. (Probably because they have no other idea.)"

The probem is that (a) this is not the mainstream definition of a random variable (in your words it is rather the prudent definition, though I do not understand what is the prudence; as for me, the standard definition is more prudent); and (b) this "your" definition does not appear in Wikipedia (as far as I know). Really, I am not quite protesting against it. But for now the reader should be puzzled unless he/she reads your comment here on the talk page. In order to do it correctly you should first introduce "your" approach in other articles (first of all, "Random variable") and only then use it here, with the needed explanation. And of course, for succeeding with this project you need reliable sources. Boris Tsirelson (talk) 16:40, 2 October 2013 (UTC)

And please do not forget to sign your messages with four tildas: ~. :-) Boris Tsirelson (talk) 16:44, 2 October 2013 (UTC)


 * 1° Just to make things clear, I am not talking about random vector.
 * 2° For me the definition of a random variable is the same as in the section "Measure-theoretic definition" of the article "Random Variable" and is what is actually used in the section "General definition" of the article Expected value.
 * 3° What I am trying here is to fill the gap between the "univariate cases" and the "General definition". "univariate cases" are simplified cases of "General definition". It was not easy to see at first, so I am trying to fill the gap. My contribution is simply to consider the "General definition" with $$\Omega=\mathbb{R}^n$$ and then say that, if $$n=1$$, then for simplicity one often consider the $$X(x_1)=x_1$$ as done in the univariate cases.
 * 212.63.234.4 (talk) 11:38, 3 October 2013 (UTC)


 * But your formula is not parallel to the univariate case (as it is presented for now):
 * "If the probability distribution of X admits a probability density function f(x), then the expected value can be computed as
 * $$   \operatorname{E}[X] = \int_{-\infty}^\infty x f(x)\, dx .  $$"
 * You see, nothing special is assumed about the probability space; it is left arbitrary (as usual), and does not matter. What matters is the distribution. Not at all "$$X(x_1)=x_1$$". If you want to make it parallel, you should first add your-style formulation to the univariate case: "It is always possible to use the change-of-variable formula in order to pass from an arbitrary probability space to the special case where (you know what) without changing the distribution (and therefore the expectation as well)", something like that. Also your terminology... what you call a multivariate random variable is what I would call a univariate random variable defined on the n-dimensional probability space (you know which). What about sources for your terminology? Boris Tsirelson (talk) 12:46, 3 October 2013 (UTC)


 * 1° (Just to be aware of what we talk about) How would you define formally a "Univariate random variable" ? Note that this term is not in the article "Random variable".
 * 2° Don't you agree that there is a gap that needs to be filled between the univariate definitions and the general definition ? I totally agree if someone helps me to improve my possibly inadequate modification.
 * 3° As far as terminology is concern, here is a reference for bivariate (multivariate is a similar extension) http://books.google.be/books?id=lvF19OwEFekC&lpg=PA29&ots=UNfSi10t3l&dq=%22Univariate%20continuous%20random%20variable%22&pg=PA29#v=onepage&q=%22Univariate%20continuous%20random%20variable%22&f=false
 * 212.63.234.4 (talk) 14:25, 3 October 2013 (UTC)


 * Ironically, the book pointed by you confirms my view and not yours! There I read (page 29): "bivariate continuous random variable is a variable that takes a continuum of values on the plane according to the rule determined by a joint density function defined over the plane. The rule is that the probability that a bivariate random variable falls into any region on the plane is equal..."
 * Exactly so! (a) Nothing special is assumed about the probability space; moreover, the probability space is not mentioned. Only the distribution matters. (b) it is exactly what I called a random vector (since a point of the plane is usually identified with a pair of real numbers, as well as a 2-dim vector). I do not insist on the word "vector"; but note: bivariate means values are two-dimensional (values! not the points of the probability space, but rather their images under the measurable map from the probability space to the plane). Accordingly, expectation of a bivariate random variable is a vector (well, a point of the plane), not a number! And the formula "$$X(x_1)=x_1$$" is neither written nor meant.
 * How would I define formally a "Univariate random variable"? As a measurable map from the given probability space to the real line, of course. Boris Tsirelson (talk) 18:34, 3 October 2013 (UTC)
 * Surely it would be nice, to improve the article. But please, on the basis of reliable sources, not reinterpreted and mixed with your original research. Boris Tsirelson (talk) 18:50, 3 October 2013 (UTC)


 * Actually, I do agree that my contribution is not good enough. If you see any way to make it right, do not hesitate to transform it. Otherwise, just remove it. Thank you for your patience and involvement in this discussion.212.63.234.4 (talk) 11:32, 7 October 2013 (UTC)


 * OK, I've moved it to a place of more appropriate context, and adapted it a little to that place. Happy editing. Boris Tsirelson (talk) 14:43, 7 October 2013 (UTC)

Misleading
"The expected value is in general not a typical value that the random variable can take on. It is often helpful to interpret the expected value of a random variable as the long-run average value of the variable over many independent repetitions of an experiment."

So the expected value is the mean for repeated experiments (why not just say so?), and yet you explicitly tell me that it is "in general not a typical value that the random variable can take on". The normal distribution begs to disagree. Regardless of theoretical justifications in multimodal cases, this is simply bizarre. More jargon != smarter theoreticians. Doug (talk) 18:33, 21 October 2010 (UTC)


 * What is the problem? The expected value is in general not a typical value. In the special case of the normal distribution it really is, who says otherwise? In the asymmetric unimodal case it is different from the mode. For a discrete distribution is it (in general) not a possible value at all. Boris Tsirelson (talk) 19:22, 21 October 2010 (UTC)


 * (why not just say so?) — because this is the statement of the law of large numbers — that when the expected value exists, the long-run average will converge almost surely to the expected value. If you define the expected value as the long-run average, then this theorem becomes circularly-dependent. Also, for some random variables it is not possible to imagine that they can be repeated many times over (say, a random variable that a person dies tomorrow). Expected value is a mathematical construct which exists regardless of the possibility to repeat the experiment.  //  st pasha  »  02:21, 22 October 2010 (UTC)


 * From the article: «formally, the expected value is a weighted average of all possible values.». A formal definition should refer to a particular definition of weight: probability.  As it happens, the Wikipedia article Weighted arithmetic mean refers to a "weighted average" as a "weighted mean".  "Mean" is both more precise than the ambiguous "average", and less confusing.  The Wikipedia article on the Law of large numbers links to average, which again links to mean, median and mode.  Our current article talks about average, but then stresses that it does not refer to a typical, nor even an actual, value—so as to eliminate the other definitions of "average" than "mean".  It would be much simpler to just say "mean".
 * Either just say "mean", or use "mean" when referring to probability distributions, and "expected value" when referring to random variables. That's not standard, though.  — Preceding unsigned comment added by SvartMan (talk • contribs) 00:04, 3 March 2014 (UTC)

Upgrading the article
The article seems pretty clearly to have satisfied the criteria at least for C-class quality; I'd say it looks more like B-class at this point. I'm re-rating it to C-class, and I'd love to hear thoughts on the article's current quality. -Bryanrutherford0 (talk) 03:31, 18 July 2013 (UTC)

I agree with B class rating, and a changing it accordinglyy At least for math, there is a B+rating that would be applied if there are more references.Brirush (talk) 03:18, 10 November 2014 (UTC)

Simple generalization of the cumulative function integral
Currently the article has the integral


 * $$\operatorname{E}(X) = \int_0^\infty P(X\ge x)\;dx$$

for non-negative random variables X. However, the non-negativeness restriction is easily removed, resulting in


 * $$\operatorname{E}(X) = -\!\int_{-\infty}^0 P(X\le x)\;dx+\int_0^\infty P(X\ge x)\;dx.$$

Should we give the more general form, too? -- Coffee2theorems (talk) 22:33, 25 November 2011 (UTC)


 * But do not forget the minus sign before the first integral. Boris Tsirelson (talk) 15:47, 26 November 2011 (UTC)
 * Oops. Fixed. Anyhow, do you think it would be a useful addition? -- Coffee2theorems (talk) 19:29, 4 December 2011 (UTC)
 * Yes, why not. I always present it in my courses.
 * And by the way, did you see in "general definition" these formulas:
 * $$\operatorname{E}(g(X)) = \int_a^\infty g(x) \, \mathrm{d} \operatorname{P}(X \le x)= g(a)+ \int_a^\infty g'(x)\operatorname{P}(X > x) \, \mathrm{d} x$$ if $$\operatorname{P}(g(X) \ge g(a))=1$$,
 * $$\operatorname{E}(g(X)) = \int_{-\infty}^a g(x) \, \mathrm{d} \operatorname{P}(X \le x)= g(a)- \int_{-\infty}^a g'(x)\operatorname{P}(X \le x) \, \mathrm{d} x$$ if $$\operatorname{P}(g(X) \le g(a))=1$$.
 * I doubt it is true under just this condition. Boris Tsirelson (talk) 07:26, 5 December 2011 (UTC)
 * Moreover, the last formula is ridiculous:
 * $$ \operatorname{E}(|X|) = \int_{0}^{\infty} \lbrace 1-F(t) \rbrace \, \operatorname{d}t,$$
 * if Pr[X ≥ 0] = 1, where F is the cumulative distribution function of X.
 * Who needs the absolute value of X assuming that X is non-negative? Boris Tsirelson (talk) 07:30, 5 December 2011 (UTC)
 * And a reference for this from (Papoulis, Athanasios, and S. Unnikrishna Pillai. "Chapter 5-3 Mean and Variance." Probability, random variables, and stochastic processes. Tata McGraw-Hill Education, 2002.) This book derives this form from a frequency interpretation, which should make some people happy. Its form is slightly different as your derivation counts the mass at zero twice (only an issue for discrete and mixed distributions):
 * $$\operatorname{E}(X) = \int_{0}^{\infty} 1-F(x)\;dx - \int_{-\infty}^{0}F(x)\;dx$$
 * Best regards; Mouse7mouse9


 * Sure. This "your" formula is correct in all cases (including discrete and mixed distributions).
 * And please sign your messages (on talk pages) with four tildas: ~ . Boris Tsirelson (talk) 06:28, 17 June 2015 (UTC)

Intuitively?
What is intuitively supposed to mean in this instance? Hackwrench (talk) 01:15, 24 October 2015 (UTC)


 * There's only one use of "intuitively" in the article, so you must refer to the first sentence:


 * In probability theory, the expected value of a random variable is intuitively the long-run average value of repetitions of the experiment it represents.


 * It's "intuitively", because this isn't how it is defined for a variety of reasons. Wikipedia articles usually start with a definition of the thing they're about, so some word is needed to convey that what looks like a definition and is in place of a definition actually isn't a definition. The actual definition is currently at the end of the second paragraph:


 * The expected value of a random variable is the integral of the random variable with respect to its probability measure.


 * I actually opened the intro with this once upon a time, but some people weren't happy with it, even though the less technical definitions immediately followed and the next paragraph explained the interpretation as a long-run average. Go figure.


 * It's also "intuitively", because this tends to be how one thinks about expected value in statistics. "It's just like the mean, except for an infinite number of things, using the obvious limit at infinity." It's also couched in empirical statistical terms like "long-run", "repetitions", "experiments". The concrete (and real-world relatable) tends to be more intuitive than the abstract. -- Coffee2theorems (talk) 19:07, 11 December 2015 (UTC)

Article needs a re-write (toned down from: WTF is this $hit)
It's.... there is some stuff in the talk page about it being crap and I wont flaunt unprovable credentials around but it's fucking wank. Scrolling one-third down (I give you the TOC is large) we have finally covered summations, infinite summations and CRVs, great! WHERE'S THE MEASURE THEORY side, these are the same concept! I searched for "probability measure" it only occurs once in the second paragraph which looks more copied and pasted than anything with the last sentence being a note saying what I just said, the finite, countably finite and uncountable cases are special cases. Anyway the general definition is only given once, right after that someone just sticks +/- infinity as the integral limits and thinks they have it covered. WAY DOWN AGAIN we have "it's useful in quantum mechanics", Like 5/6ths down we hit "expectation of a matrix" which is a definition, then down to "special" cases. Sometimes things are beyond repair. I think the entire thing needs to be archived and it re-written. Also if it is "B" class, I'd hate to see a "C" class.
 * I have written better things when not on ritalin and immediately after a cocane-in-solution enema.


 * Wow! We are used to complaints that our mathematical articles are too advanced, that is, rather inaccessible (just look at the first item above, and "Frequently Asked Questions" on the top of Wikipedia talk:WikiProject Mathematics: "Are Wikipedia's mathematics articles targeted at professional mathematicians?", "Why is it so difficult to learn mathematics from Wikipedia articles?" etc etc). But it is the first time that I see a (quite passionate) complaint that a mathematical article is not enough advanced, that is, too accessible! Regretfully, this complaint is not signed; someone may suspect that it is written by a mathematician, and the ostentatiously vulgar language is intended to mask this. Boris Tsirelson (talk) 18:43, 17 March 2016 (UTC)
 * Why would one mask being a mathematician? I am just extremely disappointed with the disjoint mess the article is. I was just wondering something and there's just so much crap to wade though and there's not even very much on what should be the proper definition! I get making it accessible, I've been dealing with expectation since A-levels, but this article would help neither me back then nor now. I would re-write it myself but let's be honest, the Jesus' mathematician cousin would get his attempt reverted and then some tit who never ventures into mathematics (let alone that part of Wikipedia) but with (larger number) of edits under her belt would be like "here are some rules, reverting!" and nothing would change. I hope that you go back to the article and think "Whoa, that ADHD thing was perfect, it looks like it was written by crack-using ADHD suffer in a room with a device that made different sounds at random intervals" 90.199.52.141 (talk) 19:28, 17 March 2016 (UTC) (signed ;-) )
 * You sound rather immature. I'll attempt to take you seriously anyway.
 * Beginning with a too-general formulation obstructs readers in three ways. First, they may not understand the general formulation.  Remember that very few people (in an absolute sense) ever hear of the Lebesgue integral, and fewer still understand it.  For them, it is no help to say that all expected values can be put in a unified framework because that unified framework is incomprehensible to them.  (If it is obvious to you, then good; but remember that you are exceptional.)  Second, most readers are interested in applying expected values to some situation they have at hand.  That situation is usually discrete (gambling, opinion surveys, some medical experiments) or univariate real (most real-world measurements).  Proceeding from the general formulation to a formulation that can be applied in practice requires a little bit of effort.  While it is a good exercise to derive specific formulas from the general one, Wikipedia is not a teaching tool.  It is a reference, and a good reference includes formulas for important special cases.  Third, people's understanding proceeds from special to general.  Even young children can grasp discrete probability theory; nobody without significant experience grasps the Lebesgue integral.  Even a reader whose goal is generality for its own sake cannot get reach maximum generality without grasping special cases.  A reader who does not yet understand the general theory benefits from seeing special cases, like the discrete and univariate real cases, worked out in detail.  Once they are both understood, the general case is more approachable.
 * For these reasons, I think the approach taken in the article is, in broad outline, excellent. That does not make the article perfect, but I would have a difficult time improving on the outline of its initial sections.  Ozob (talk) 01:03, 18 March 2016 (UTC)
 * I do not understand the motivation for defining expected value as one type of average value alone. That would suggest that there is no expectation of a value for the Cauchy distribution, whereas the censored mean or median 24% of the distribution is asymptotically more efficient than the median as a measure of central tendency location. That means, logically, that one either expands the concept of expected value, or one defines some other measure of location that is sometimes expected value, and sometimes not. Let us take the example of the beta distribution, when it has a single peak (is not U-shaped), the median is a better location of "tendency" than the mean. Now if we do not mean "tendency" when we are describing "expectation" then we wind up with semantic gibberish. I do not see a way out of this quagmire and ask for help on this.CarlWesolowski (talk) 17:37, 27 June 2016 (UTC)
 * I am not quite understanding you. Yes, definitely, there is no expectation value for the Cauchy distribution. In such a heavy-tail case, yes, median is much better than expectation... if you want to determine the location parameter. But if you want to predict the sample mean in a sample of 1000 values (not censored!), then the median is cheating: no, that mean will not be close to it (and the expectation says the "sad" truth: you really cannot predict the sample mean). Boris Tsirelson (talk) 21:05, 27 June 2016 (UTC)

Thank-you for responding. Part of my problem was completely not understanding what statisticians do to English. Now I do a bit better. Expectation is not always a valid measure of location for a random variable. As a pure mathematical fact if we are taking about the Cauchy distribution as a continuous function, and not as a random variable, then it has a "mean" but obviously as a Cauchy distributed random variable it has no such thing. Proof: Take the x CD(x) integral of the continuous function from median minus k to plus k, that always exists and is both a censored mean and the median identically. Now, let k be as large as desired. It makes no difference to the Cauchy distribution as a continuous function what one calls the peak. It only makes a difference when a random variable is Cauchy distributed. A fine point to be sure. But when you talk in shorthand and say a Cauchy distribution has no expected value you leave out words like "Cauchy distributed random variable." which for me was a hurdle. CarlWesolowski (talk) 04:14, 9 September 2016 (UTC)
 * I see. Yes, expectation (=mean) may be thought of probabilistically or analytically. For me, the probabilistic approach is the first, since it is the source of motivation. But even analytically, if we define expectation as the integral of x times the density, then we see an improper integral, and it diverges (for Cauchy distribution, I mean). However, it has Cauchy principal value, and this is what you mean (as far as I understand). Still, I would not like such terminology as "expectation of a random variable" and "expectation of its distribution" defined to be nonequivalent. Boris Tsirelson (talk) 04:59, 9 September 2016 (UTC)
 * I think that the probabilistic and analytic interpretations are the same. It is only when one replaces the integral used in defining expectation with the Cauchy principal value that one runs into confusion.  This problem is more acute if one uses Riemann integration instead of Lebesgue integration; then all integrals defined on R must be defined as improper integrals and the Cauchy principal value arises accidentally.  Ozob (talk) 03:17, 10 September 2016 (UTC)

Wrong formula in "General definition"
It was
 * $$\operatorname{E}[g(X)] = \int_{-\infty}^\infty g(x) \, \mathrm{d} \mathrm{P}(X \le x)=$$$$\begin{cases} g(a)+ \int_a^\infty g'(x)\mathrm{P}(X > x) \, \mathrm{d} x & \mathrm{if}\ \mathrm{P}(g(X) \ge g(a))=1 \\ g(b) - \int_{-\infty}^b g'(x)\mathrm{P}(X \le x) \, \mathrm{d} x & \mathrm{if}\ \mathrm{P}(g(X) \le g(b))=1 \end{cases}$$

and then (after an edit by User:Rememberpearl)
 * $$\operatorname{E}[g(X)] = \int_{-\infty}^\infty g(x) \, \mathrm{d} \mathrm{P}(X \le x)=$$$$\begin{cases} g(a)\mathrm{P}(X \ge a)+ \int_a^\infty g'(x)\mathrm{P}(X > x) \, \mathrm{d} x & \mathrm{if}\ \mathrm{P}(g(X) \ge g(a))=1 \\ g(b)\mathrm{P}(X \le b) - \int_{-\infty}^b g'(x)\mathrm{P}(X \le x) \, \mathrm{d} x & \mathrm{if}\ \mathrm{P}(g(X) \le g(b))=1 \end{cases}$$

but both versions are evidently wrong. Indeed, the values of the function g for x<a matter in the left-hand side, but do not matter in the right-hand side. Probably, more assumptions on g are needed. Boris Tsirelson (talk) 17:15, 26 September 2016 (UTC) Moreover, I wrote it already in 2011, see above. Boris Tsirelson (talk) 17:21, 26 September 2016 (UTC)

Now correct, thanks to User:Hgesummer. Boris Tsirelson (talk) 06:03, 14 October 2016 (UTC)

"Unconscious" or "subconscious"  statistician?
Maybe somebody can   clarify  this rather  trivial point. Many American sources refer to the Law of The  Unconscious Statistician  (or LoTUS). However the reputable British   probabilists   G. Grimmett  and D. Welsh  refer to it as the  Law of the  subconscious statistician  (see Grimmett, Welsh: Probability. An Introduction, 2nd Edition, Oxford University Press,  2014.)

To me, as an adjective,  the word   "subconscious"   seems more appropriate,   even though the acronym  LoTSS does not sound as appealing as  LoTUS. (The odds that any unconscious statistician would compute any expectation are very low. Moreover, if you googled the term "unconscious person" you would be sent to pages explaining how to administer first aid.) Does anybody know if there is a general   agreement on this terminology, or this  is  one of those  discrepancies between British English and American English?

---

In response:

On LOTUS: The term appears to have been coined by Sheldon Ross (it has been around many years now - I believe I first heard it about 30 years ago), and contains a deliberate pun; the substitution of "subconscious" for "unconscious" obliterates the joke. Glenbarnett (talk) 11:37, 28 September 2017 (UTC)

Basic properties section is bloated and no longer so basic
The section on basic properties was once actually useful to point beginning students to who were coming to grips with basic issues like linearity of expectation.

Now the "Basic properties" section is brim full of stuff that is utterly useless to the beginner - it's not even in a useful order. This is not a mathematics text. It's not an article on "all the things I managed to prove when I learned about expectation". It's an article that should be as accessible as possible to a fairly general audience.

The section is now frankly worse than useless as a reference. The entire section should simply be rolled back to before the addition of all the extra crud. Take it back a year and keep it clean. If you must add 50 properties that 99% of people reading the article will never use even once, put all that crud in a new section called "Further properties" or something.

This used to be a half-decent article. Now I'm embarrassed to link to it. Glenbarnett (talk) 11:30, 28 September 2017 (UTC)


 * What is the problem? You feel you know which properties are basic and which are "further". Just start the "further" section and divide the properties. Does anyone object? Boris Tsirelson (talk) 12:16, 28 September 2017 (UTC)


 * Hi Glenbarnett,


 * 1. Which exactly basic properties are not "useful" for beginners, in your opinion? Basic properties are basic in the sense that they derive from the corresponding property of Lebesgue integral. It might be worth adding a sentence explaining this, to set the expectations right.


 * 2. I, personally, don't like the "Non-multiplicativity" section: IMHO, it's not very informative, and its content is redundant. Should we get rid of it?


 * 3. Regarding "all the things I managed to prove", I'm sure you've noticed that not every property has been proved but only those whose proof shows something important about the field. Methodology, that is.


 * 4. The article is already "as accessible as possible to a fairly general audience". No one wants to make things harder than they ought to be.


 * Cheers. StrokeOfMidnight (talk) 23:45, 28 September 2017 (UTC)

Proving that X=0 (a.s.) when E|X|=0
I changed the proof because the one based on M's inequality has a logic gap: the first sentence ("For non-negative random ...") is false. We prove that P(X>a)=0, for every a>0, but don't transition to P(X≠0)=0, which is what needs to be proven. By the way, it is not at all harder to prove this fact directly, so why even bother with M's inequality. StrokeOfMidnight (talk) 15:52, 2 October 2017 (UTC)


 * Well, I do not insist; do it if you insist, but do not be astonished if someone else will object. For a mathematician maybe the direct proof is a bit more illuminating, but for others (the majority!) it is unnecessary longer (I feel so). (And more generally, your writings tend to smell advanced math, which irritates the majority; we are not on a professional math wiki like EoM, and the expectation is of interest for many non-mathematicians.) About the problem of a>0 you surely see how to correct this error readily; this not a reason to avoid Markov inequality. Boris Tsirelson (talk) 20:28, 2 October 2017 (UTC)


 * First, your points are well taken. I think, the simplest way not "to irritate the majority" is to make proofs "collapsible".  I will look into that, but if I'm too busy, someone else can do that too.  Second, the only reason I don't want to use Markov's inequality is that it doesn't make this particular proof shorter. Surely, in different circumstances, this inequality would be indispensable. StrokeOfMidnight (talk) 21:07, 2 October 2017 (UTC)


 * Update. So, I've made two proofs (incl. the contentious one) hidden by default. Will this, in your opinion, address the majority crowd? If so, what else should be hidden? StrokeOfMidnight (talk) 21:39, 2 October 2017 (UTC)


 * Somewhat better. However, proofs are generally unwelcome here (and by the way, on EoM as well) . If in doubt, ask WT:MATH. A proof is included only if there is a special reason to make exception for this proof. If you want to write a less encyclopedic, more textbook-ish text, you may try Wikiversity (WV). Yes, it is much less visited than WP. However, it is possible to provide a link from a WP article to a relevant WV article (if the WP community does not object, of course); this option is rarely used, but here is a recent example: the WP article "Representation theory of the Lorentz group" contains (in the end of the lead) a link to WV article "Representation theory of the Lorentz group". Boris Tsirelson (talk) 05:49, 3 October 2017 (UTC)