Talk:Rankit

Early comments
In my last edit summary, I should have said it's a normal probability plot regardless of whether the underlying distribution is normal. Rankits are based on a normal distribution. Normal probability plots are used in order (among other things) diagnose non-normality! Michael Hardy 23:49, 10 February 2006 (UTC)

This topic seems extremely obscure. Even the reference link gives nothing on what a rankit is. A search on Google turns up extremely little. Even information on the Mr. Bliss is very scarce. So, the question is: is this something that people actually use? It seems that its utility is very, very low relative to the Q-Q plot. Can anyone demonstrate real utilization of it? I'd like to suggest removing this topic as an idea that not only never caught on, because it is basically useless.


 * It is certainly something that virtually all statisticians use all the time. Normal probability plots are standard fair.  The individual numbers may often be called "expected normal order statistics" or the like, rather than "rankits".  In the graduate program in statistics at the University of Minnesota, use of the term is widespread, perhaps because one of the professors studied under Chester Bliss.  I think having a short term rather than a long descriptive phrase is useful Michael Hardy 21:02, 3 November 2006 (UTC)

Article title?
Should the article title be Rankit or Normal probability plot ? What do others think? DFH 18:53, 29 January 2007 (UTC)
 * Rankit just scored 40,100 hits on Google
 * "Normal proability plot" scored 93,900 hits
 * Well, I recently needed to use this page. I searched for a probability plot, and saw that the normal probability plot redirected to here (I created a redirect for probability plot to this page too, for the time being).  I think the appropriate page name should be "normal probability plot" over "rankit" simply because I have never heard of the term rankit in any statistics course I have had, while I have used normal probability plots extensively.  Jason Smith 05:57, 21 February 2007 (UTC)

I think a point in favor of "rankit" is that it's a simpler idea than "normal probability plot". One uses rankits in the construction of normal probability plots. Michael Hardy 23:30, 23 February 2007 (UTC)
 * I have put articles at Probability plot and Normal probability plot based on the public domain counterparts at NIST. Does anyone feel like merging Rankit into Probability plot? Btyner (talk) 15:48, 16 February 2008 (UTC)

Chester Bliss
Here's a biographical reference: DFH 19:13, 29 January 2007 (UTC)
 * "Chester Ittner Bliss, 1899-1979", William G. Cochran, David J. Finney, in Biometrics, Vol. 35, No. 4 (Dec., 1979), pp. 715-717.

Is the Q-Q plot superior to the P-P plot?
The article says that one can plot the Q-Q plot with quantiles of any other distribution. Why is this not possible for the P-P plot? One should think that this is possible for the P-P plot also. Vivek 08:29, 29 May 2007 (UTC)

Expected values of the resulting order statistics
In the article the following expected values of the order statistics (n=6) are shown:
 * $$-1.2816,\ \ -0.64335,\ \  -0.20189,\ \  0.20189,\ \  0.64335,\ \  1.2816\,.$$

When I calculate the numbers I get slightly different numbers. I calculated them using numerical integration of the probability function of the order statistics. I checked them using a small Monte Carlo simulation (10 million trials). The numbers are:
 * $$-1.2672,\ \ -0.641755,\ \  -0.201557,\ \  0.201557,\ \  0.641755,\ \  1.2672\,.$$

Are the current numbers in the article an approximation? Maybe they should be changed to the more accurate numbers.

jasper (talk) 13:17, 27 December 2007 (UTC)


 * I'll look into this. Michael Hardy (talk) 18:51, 27 December 2007 (UTC)


 * ...OK, I've looked at it a bit and I'm suspecting a software bug may have been involved in getting the numbers I put in the article. Michael Hardy (talk) 21:37, 27 December 2007 (UTC)

I ran a very small Monte Carlo simulation (44,000 trials) and it was enough to convince me that the numbers proposed by "jasper" are clearly much closer to the truth than what was there already. I've edited the article accordingly. Michael Hardy (talk) 21:50, 29 December 2007 (UTC)

BobJordanB (talk) 07:44, 28 March 2011 (UTC) I did a check on where the -1.2816 sequence came from. It comes from a common rule to generate rankits. The 6 numbers given are calculated z values using probabilities of $$p=(i-0.375)/(n+1-2*0.375)$$. This is a common rankit approximation and can be assigned to Blom in a 1958 paper "Statistical Estimates and Transformed Beta-Variables" published by John Wiley.. So they can be calculated from (for example in excel) $$=NORMSDIST((i-0.375)/(n+1-2*0.375))$$ for in this case $$n=6$$ and for $$i=1,2,3,4,5,6$$. The general formula is $$p=(i-k)/(n+1-2*k)$$. The k value of 0.375 is considered a 'good' approximation but it is common to see many others. I have often used k=0.5. Excel uses k=0 to give p=i/(n+1) and another common one is k=0.3.

I'll try and prepare some stuff for the front page on this.

Gcap1 (talk) 18:07, 30 December 2019 (UTC) I calculated the values using the method given here: https://math.la.asu.edu/~diane/Fall_2013/STP_231/231_Section4_4problem.pdf AND I (independently) used the NormProbPlot on my TI-84 CE. In both cases, I got this:
 * $$-1.382994,\ \ -0.6744898.\ \ -0.210428,\ \ 0.210428,\ \ 0.6744898,\ \ 1.382994\,.$$


 * Small correction: Blom 1958 is a book, not a paper. --Gwern (contribs) 01:17 1 August 2017 (GMT)

How are the rankits calculated?
Nowhere on the page is it explained how one can calculate the expected order statistics, and it should be. 71.64.105.56 (talk) 23:55, 26 September 2009 (UTC)
 * Good point. Some crude methods are obvious, but efficient methods that you would want to use in practice are more work to discover.  This is certainly out there in the literature somewhere. Michael Hardy (talk) 20:13, 5 April 2010 (UTC)

BobJordanB (talk) 09:37, 28 March 2011 (UTC) I suggest the following - just a little nervous to put it up front just yet!

Values for the Rankits
The rankits can be estimated using a number of formula although all are approximations to the real thing - for example the sorts of values discussed above.

The key here is the word Expected ie 'Expected values of the Normal order statistics'.

That 'Expected value' corresponds to the mean and there is no simple formula for this.

The correct formula involves an integral of products of the powers of the Normal and cumulative Normal curves taken to various powers.

It goes something like this according to Teichroew


 * $$E(x_j;N) = {N! \over (j-1)! (N-j)!} B(j-1,N-j)$$,

where


 * $$B(m,n)= \int_{-\infty}^{\infty} x f(x) F(x)^m(1-F(x))^n  dx$$,

and where $$f(x)$$ and $$F(x)$$ are the Normal distribution function in density and cumulative form.

These have been calculated and tabulated in a number of places and one example for N=1 to 20 is Tiechroew


 * More can be found on this inside Wikipedia and in other sources using a search on 'expected values of the Normal order statistics'

It is relatively easy to calculate the values of these corresponding to the median (ie not the expected or mean value) and a formula involving the inverse Beta distribution is commonly used ie for Excel write
 * $$=NORMSDIST(BETAINV(0.5,i,n+1-i))$$

where the constant 0.5 forces the median value, $$n$$ is the number of order statistics being calculated, and $$i$$ is the particular one. So $$i = 1,2,3,...,n$$. From that inverse beta approach one can also calculate the positions of the various percentiles by changing the 0.5 value.

There are a number of approximations to the expected values of the normal order statistics


 * $$=NORMSDIST((i-k)/(n+1-2*k))$$

where
 * $$i$$ and $$n$$ are the same as before and $$k$$ is a constant that takes on values between 0 and 0.5.

Are you sure you don't mean NORMSINV in the equation above? — Preceding unsigned comment added by 206.173.46.67 (talk) 20:10, 16 November 2011 (UTC)

Examples are:
 * Blom who sets $$k=0.375$$ - said to be a 'good approximation.
 * Hazen and others use $$k=0.5$$ - a good one as suggested also by Gilchrist.
 * Tukey suggested $$k=1/3$$, while
 * Weibull suggested k=0 which is used in Excels percentile function. Another is
 * Benard who suggested $$k=0.3$$.

I have tended to use $$P=(i-0.5)/n$$ in $$NORMSDIST(P)$$ as a good all round and simple form.

And when n becomes large the choice above becomes a little academic.

More can be found in an excellent book by Gilchrist