Wikipedia:Reference desk/Archives/Mathematics/2010 September 3

= September 3 =

Standard deviation
Hi all! In physics we're doing a bit of stats and I noticed in the standard deviation formula they divide by N-1 rather than just N. I asked my teacher and he said he didn't get it either, and look it up on Wikipedia or something like that, so here I am. I tried looking at your articles Standard_deviation and Bessel's correction, but that didn't really help because I don't have a university-level stats background :/ Can someone who does explain why you divide by N-1, in simpler terms? I'm OK with (and even expect you to) dumb the concept down a little --cc —Preceding unsigned comment added by 76.229.208.208 (talk) 01:58, 3 September 2010 (UTC)
 * As I understand it, the N-1 come in because you are trying to estimate the actual standard deviation based on sample data. If you put N in the denominator it turns out that the estimate will, on average, be too low. So a correction factor is built into the formula so that the estimate will average to the actual value if the experiment is repeated many times. When the correction factor is added it works out the same as using N-1 in the denominator instead of N. It has been noted here before though, if your sample is small enough that it actually makes a difference then your sample size is too small.--RDBury (talk) 03:46, 3 September 2010 (UTC)


 * See the Wikipedia article on unbiased estimator, which has the explanation you're looking for. --173.49.14.153 (talk) 04:20, 3 September 2010 (UTC)


 * If you knew the population (actual) mean rather than estimating it and used that to get the squared differences then N would be correct. However using the sample (estimated) mean makes the sum of the squared differences slightly smaller. In fact the sum of the squared differences from the population mean is equal to the sum of the squares of the differences from the sample mean plus N times the square of the difference between the population mean and the sample mean. This itself gives you an estimate of the probable difference between the population and sample mean so the workings out in the article is just using this to get an estimate of the sum of squared differences from the population mean. A finickety point is that it is only the expression without the square root that is unbiased, the estimated standard deviation from taking the square root is biased but I would worry even less about that than using N instead of N-1 in the denominator. Dmcq (talk) 07:57, 3 September 2010 (UTC)

Maybe it won't hurt to mention also that unbiasedness may be slightly over-rated, at least by non-statisticians. See my paper on this: "An Illuminating Counterexample", American Mathematical Monthly, Vol. 110, No. 3 (March, 2003), pp. 234–238. Michael Hardy (talk) 18:47, 4 September 2010 (UTC)

Random variables
Hello mathematicians! Can you please help me solve this. It's not homework, it's actually work work. Say $$S$$ is the amount of money I make per "event" and $$E$$ is the number of events per year. Let's also say that $$S$$ has a lognormal distribution and $$E$$ is a poisson distribution (the parameters for $$S$$ can be estimated from some data and let's assume that the parameter for $$E$$ is known).

A) Then the total money I make from these events in one year is $$P = SE$$. Is there an analytic distribution function for $$P$$ ?

B) Will the following monte-carlo methods work to determine a distribution for $$P$$ :
 * 1) sample a random value from $$E$$, say $$e$$, then sample $$e$$ values of $$S$$ and add them up - repeat this many times; or
 * 2) sample a random value from $$E$$, say $$e$$, and sample a random value of $$S$$, say $$s$$, and then use $$es$$ - and repeat this many times.

What is the difference between these two methods? What other possible numerical methods can I use to determine $$P$$ ? Thanks very much. --Mudupie (talk) 17:32, 3 September 2010 (UTC)
 * I'll assume that the events don't all make the same amount of money, but rather that each makes an independent contribution drawn from some distribution. Then $$P\neq SE$$. In fact there isn't even an S, there are iid random variables $$S_1,\ S_2,\ \cdots, S_E$$, and $$P=\sum_iS_i\;\!$$. So it's clear that you can't sample the distribution of P with method 2 - you'll get a different distribution which has a much higher variance. You can use method 1, though.
 * You may know that if X and Y are iid then $$\mathbb{V}[X+Y]=\mathbb{V}[X]+\mathbb{V}[Y]=2\mathbb{V}[X]$$ while $$\mathbb{V}[X+X]=\mathbb{V}[2X]=4\mathbb{V}[X]$$. If it seems that E being random makes a difference, think what happens when $$\lambda$$ is large - then E is roughly constant.
 * If finding the expectation and variance of the distribution suffices, you have $$\mathbb{E}[P]=\mathbb{E}[E]\mathbb{E}[S]$$, and if I'm not mistaken $$\mathbb{V}[P] = \mathbb{E}[E^2]\mathbb{E}[S]+\mathbb{E}[E]\mathbb{V}[S]-\mathbb{E}[E]^2\mathbb{E}[S]^2$$. This holds no matter what are the distributions of E and S, as long as everything is independent. -- Meni Rosenfeld (talk) 18:56, 4 September 2010 (UTC)

Thanks very much Meni! That was very useful information. I have one follow up question for now. I'm trying to understand how to derive the expectation of P. I guess the following equation holds but I don't understand why: $$\mathbb{E}[S_1 + ... + S_E] = \mathbb{E}[S_1 + ... + S_\lambda]$$, where λ is just the expectation of E. I "get" that it makes sense but I don't know the actual theoretic reason. Can you please explain? --Mudupie (talk) 23:09, 4 September 2010 (UTC)
 * $$\mathbb{E}[S_1 + ... + S_\lambda]$$ only makes sense when λ is an integer, so it's not useful to talk about it. What I did is to write $$P_k = \sum_{i=1}^kS_i$$ and $$P=P_E=\sum_{k=0}^{\infty}I(E=k)P_k$$. Then finding $$\mathbb{E}[P]$$ is just some algebraic manipulations. -- Meni Rosenfeld (talk) 11:20, 5 September 2010 (UTC)
 * Thanks again mate! I managed to arrive at the expression for E[P] using your approach. I'll try to do the variance one as well and come back here if I get stuck. --Mudupie (talk) 09:41, 6 September 2010 (UTC)

Formula images
In every maths page on wikipedia I notice the formulae are images not text. How do you create these? On Mac? Thanks for any replies.86.147.12.111 (talk) 18:05, 3 September 2010 (UTC)


 * See Help:Displaying a formula. —Bkell (talk) 18:27, 3 September 2010 (UTC)


 * Thank you86.147.12.111 (talk) 19:42, 3 September 2010 (UTC)

Also, when you see a page with such formulas, if you click on "edit", you'll see how they are created. Michael Hardy (talk) 18:51, 4 September 2010 (UTC)

Homogeneous polynomials
The symmetric degree 4 homogeneous polynomial in two variables: x4 + x3y + x2y2 + xy3 + y4 can be written (x5&minus;y5)(x&minus;y)&minus;1 for x&ne;y. What is the analogous expression for the symmetric degree 4 homogeneous polynomial in 3 variables: x4 + x3y + x3z + x2y2 + x2yz + x2z2 + xy3 + xy2z + xyz2 + xz3 + y4 + y3z + y2z2 + yz3 + z4 ? Bo Jacoby (talk) 22:28, 3 September 2010 (UTC).
 * First, just to be consistent with the terminology, these are called the complete homogeneous symmetric polynomials. The expression you're looking for follows from the properties of Schur polynomials.
 * $$ s_{4} (x, y, z) = \frac{1}{\Delta} \;

\det \left[ \begin{matrix} x^6 & y^6 & z^6 \\ x & y & z \\ 1 & 1 & 1 \end{matrix} \right] $$
 * which turns out to be the complete symmetric polynomial. Here Δ is the product of the differences (x−y)(x−z)(y−z).--RDBury (talk) 04:33, 4 September 2010 (UTC)
 * Thank you very much! Bo Jacoby (talk) 06:10, 4 September 2010 (UTC).
 * No problem but please be civil. —Preceding unsigned comment added by 114.72.252.111 (talk • contribs)
 * It is plainly obvious from the edit history that User:Bo Jacoby did not make the uncivil comment you are referring to, per . I have removed the IP's offending comment. -- Kinu t /c  05:19, 5 September 2010 (UTC)