User:Rb88guy/sandbox

Introduction
THIS IS A ROUGH DRAFT, NEEDS A LOT OF WORK

Helmert's distribution of sn
The distribution of the sample standard deviation sn was derived by Helmert, and is given by



s_n \,\, \sim \,\,\,{{n^ } \over {2^ \,\,\Gamma \left( \right)\,\,\sigma ^{n - 1} }}\,\,\,\,s_n ^{n - 2} \,\,\exp \left[  \right]$$

where n is the sample size, taken from an NID population whose true standard deviation is σ. The statistic sn is found using


 * $$s_n \,\,\, = \,\,\,\sqrt $$

as opposed to the statistic sn−1 as defined above, in which the divisor under the square root is n−1. It can be shown that the expected value (mean) of this distribution is



{\rm E}\left[ {s_n } \right]\,\,\, = \,\,\,\sigma \,\,\left\{ {\sqrt \,\,{1 \over {B\left( {{{n - 1} \over 2},{1 \over 2}} \right)}}} \right\}$$

where B is the beta function. Using an identity for the beta and gamma functions



B\left( {z,w} \right)\,\,\, = \,\,\,{{\Gamma \left( z \right)\,\,\Gamma \left( w \right)} \over {\Gamma \left( {z + w} \right)}}$$

it follows that



{\rm E}\left[ {s_n } \right]\,\,\, = \,\,\,\sigma \,\,\sqrt {\,{2 \over n}} \,\,\,{{\Gamma \left( {{n \over 2}} \right)} \over {\Gamma \left( \right)}}\,\,\,\,\, = \,\,\,\sigma \,c_2$$

The symbol c2 is used in quality control. In fact, the rthmoment of this PDF can be found using



{\rm E}\left[ {s_n^{\,\,r} } \right]\,\,\, = \,\,\,\,\sigma ^r \,\,\left( \right)^{r \over 2}  \,\,{{\,\Gamma \left( {{{n + \,\,r - 1} \over 2}} \right)} \over {\Gamma \left(  \right)}}$$

Using series expansions, it can be shown that an approximate value for c2 can be obtained from


 * $$c_2 \approx \,\,\,1\,\,\, - \,\,\,{3 \over {4\,n}}\,\,\, - \,\,\,{7 \over {32\,n^2 }}\,\,\, - \,\,\, \cdots$$

Distribution of normalized sn
It is useful to have the PDF of the ratio of sn to σ so that plots, for example, will be scale-independent''. ''This amounts to a simple change of variable in the Helmert distribution. Since σ is a constant, it is straightforward to show that



{{s_n } \over \sigma }\,\,\, \sim \,\,\,{{n^{{{n\, - \,1} \over 2}} } \over {2^ \,\,\,\Gamma \left( \right)}}\,\,\,\left(  \right)^{n - 2} \exp \left[ { - \,\,{n \over 2}\left(  \right)^2 } \right]$$

and the expected value (mean) of this PDF is


 * $${\rm E}\left[ \right]\,\,\, = \,\,\,c_2$$

To illustrate this PDF, consider Figure 1 (the figures are in a gallery at the bottom of the article). This shows the Helmert PDF (solid line) and a histogram of 10000 sampled sn values, both normalized to the known standard deviation of the NID population. The vertical dashed line, just visible near the solid line showing the location of c2, is the location of the observed mean of these sn values. (The circles plotted on this figure will be addressed below.) Clearly the histogram and the PDF, and the observed mean and c2 agree well.

Figure 2 shows the behavior of the PDF of the normalized sn as the sample size increases. The c2 values, which are the means of the respective PDFs, are indicated. (The c2 for n=2 is the leftmost thin vertical line.)

Distribution of normalized sn−1
Since it is the case that



s_{n - 1}^{\,2} = \,\,\,s_n^2 \left(  \right)\,\,\,\,\,\,\, \Rightarrow \,\,\,\,\,s_{n - 1}  = \,\,s_n \sqrt {\,{n \over {n - 1}}}$$ then

{{s_{n - 1} } \over \sigma }\,\,\, = \,\,\,s_n \,\,\left( {{1 \over \sigma }\,\,\sqrt {\,{n \over {n - 1}}} } \right) $$

and everything in the parentheses is a constant. Returning to the Helmert PDF and again using the change-of-variable calculations, the result is

{{s_{n - 1} } \over \sigma }\,\,\, \sim \,\,\,{{\left( {n - 1} \right)^{{{n\, - \,1} \over 2}} } \over {2^ \,\,\Gamma \left( \right)}}\,\,\,\left(  \right)^{n - 2} \exp \left[ { - \,\,{{n - 1} \over 2}\left(  \right)^2 } \right]$$

The expected value is



{\rm E}\left[ {s_{n - 1} } \right]\,\,\, = \,\,\,\sigma \,\,\sqrt {\,{2 \over {n\,\, - 1}}} \,\,\,{{\Gamma \left( {{n \over 2}} \right)} \over {\Gamma \left( \right)}}\,\,\, = \,\,\,\sigma \,c_4 \,\,\,\,\,\,\,\, \Rightarrow \,\,\,\,\,\,\,\,{\rm E}\left[  \right]\,\,\, = \,\,c_4$$ where c4 again is a statistical quality control symbol; its series approximation is


 * $$c_4 \,\, \approx \,\,\,1\,\,\, - \,\,\,{1 \over {4\,n}}\,\,\,\, - \,\,\,\,{7 \over {32\,n^2 }}\,\,\,\, - \,\,\,\, \cdots$$

Simulation results for the sn−1 case are shown in Figures 3 and 4.

Relation of Helmert to Chi distribution
The Chi PDF is


 * $$\chi \,\,\, \sim \,\,\,{1 \over {2^{{k \over 2}\,\, - \,\,1} \,\,\,\Gamma \left( \right)}}\,\,\,\chi ^{k\, - \,1} \,\,\,\exp \left[  \right]$$

where k is the number of degrees of freedom. Taking k = n − 1, making the substitution


 * $$\chi \,\,\, = \,\,\,\sqrt n \,\,{{s_n } \over \sigma }$$

and using the change-of-variable calculations once again,



{{s_n } \over \sigma }\,\,\, \sim \,\,\,{1 \over {2^ \,\,\Gamma \left( \right)}}\,\,\,\left(  \right)^{n - 2} \exp \left[ { - {1 \over 2}\left(  \right)^2 } \right]\left( {\sqrt n } \right)$$

which reduces to the previously-found Helmert PDF for a normalized sn



{{s_n } \over \sigma }\,\,\, \sim \,\,\,{{n^{{{n\, - \,1} \over 2}} } \over {2^ \,\,\,\Gamma \left( \right)}}\,\,\,\left(  \right)^{n - 2} \exp \left[ { - \,\,{n \over 2}\left(  \right)^2 } \right]$$

A similar process for sn−1, using the substitution


 * $$\chi \,\,\, = \,\,\,\sqrt {n - 1} \,\,\,{{s_{n - 1} } \over \sigma }$$

can be shown to reproduce the Helmert normalized sn−1 PDF. The circles on the histogram plots in the figures are obtained from these calculations.

Summary
The bias-correction constants are defined as



c_2 \, \equiv \,\,\,\sqrt {\,{2 \over n}} \,\,\,{{\Gamma \left( {{n \over 2}} \right)} \over {\Gamma \left( \right)}}\,\,\,\,\,\,\,\,\,\,\,\,\,\,c_4 \,\,\,\, \equiv \,\,\,\sqrt {\,{2 \over {n - 1}}} \,\,\,{{\Gamma \left( {{n \over 2}} \right)} \over {\Gamma \left(  \right)}}$$ so that


 * $$c_2 \,\, = \,\,\sqrt {\,{{n - 1} \over n}} \,\,\,c_4$$

While the series approximations



c_2 \approx \,\,\,1\,\,\, - \,\,\,{3 \over {4\,n}}\,\,\, - \,\,\,{7 \over {32\,n^2 }}\,\,\, - \,\,\, \cdots \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,c_4  \approx \,\,\,1\,\,\, - \,\,\,{1 \over {4\,n}}\,\,\, - \,\,\,{7 \over {32\,n^2 }}\,\,\, - \,\,\, \cdots$$

are useful, modern software should permit the direct calculation of these correction factors, using the gamma functions. Figure 5 shows the behavior of these factors as a function of sample size.

Finally, to obtain an unbiased estimate of the population standard deviation for NID data, use either



\hat \sigma = \,\,{{s_n } \over {c_2 }}\,\,\,\,\,\,\,\,\,{\rm or}\,\,\,\,\,\,\,\,\hat \sigma = \,\,\,{{s_{n - 1} } \over {c_4 }}$$