User:StevenJYang/Quartile

Discrete Distributions
For discrete distributions, there is no universal agreement on selecting the quartile values.

Method 1

 * 1) Use the median to divide the ordered data set into two halves.
 * 2) * If there is an odd number of data points in the original ordered data set, do not include the median (the central value in the ordered list) in either half.
 * 3) * If there is an even number of data points in the original ordered data set, split this data set exactly in half.
 * 4) The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

This rule is employed by the TI-83 calculator boxplot and "1-Var Stats" functions.

Method 2

 * 1) Use the median to divide the ordered data set into two halves.
 * 2) * If there are an odd number of data points in the original ordered data set, include the median (the central value in the ordered list) in both halves.
 * 3) * If there are an even number of data points in the original ordered data set, split this data set exactly in half.
 * 4) The lower quartile value is the median of the lower half of the data. The upper quartile value is the median of the upper half of the data.

The values found by this method are also known as "Tukey's hinges"; see also midhinge.

Method 3

 * 1) If there are even numbers of data points, then Method 3 is the same as either method above
 * 2) If there are (4n+1) data points, then the lower quartile is 25% of the nth data value plus 75% of the (n+1)th data value; the upper quartile is 75% of the (3n+1)th data point plus 25% of the (3n+2)th data point.
 * 3) If there are (4n+3) data points, then the lower quartile is 75% of the (n+1)th data value plus 25% of the (n+2)th data value; the upper quartile is 25% of the (3n+2)th data point plus 75% of the (3n+3)th data point.

Method 4
If we have an ordered dataset $$x_1, x_2, ..., x_n$$, we can interpolate between data points to find the $$p$$th empirical quantile if $$x_i$$ is in the $$i/(n+1)$$ quantile. If we denote the integer part of a number $$a$$ by $$[a]$$, then the empirical quantile function is given by,

$$q(p) = x_{(k)} + \alpha(x_{(k+1)} - x_{(k)})$$,

where $$k = [p(n+1)]$$ and $$\alpha = [p(n+1)] - p(n+1)$$.

To find the first, second, and third quartiles of the dataset we would evaluate $$q(0.25)$$, $$q(0.5)$$, and $$q(0.75)$$ respectively.

Example 1
Ordered Data Set: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49

Example 2
Ordered Data Set: 7, 15, 36, 39, 40, 41

As there are an even number of data points, all three methods give the same results.

Continuous Probability Distributions
If we define a continuous probability distributions as $$P(X)$$ where $$X$$ is a real valued random variable, its cumulative distribution function (CDF) is given by,

$$F_X(x) = P(X \leq x)$$.

The CDF gives the probability that the random variable $$X$$ is less than the value $$x$$. Therefore, the first quartile is the value of $$x$$ when $$F_X(x) = 0.25$$, the second quartile is $$x$$ when $$F_X(x) = 0.5$$, and the third quartile is $$x$$ when $$F_X(x) = 0.75$$. The values of $$x$$ can be found with the quantile function $$Q(p)$$where $$p = 0.25 $$ for the first quartile, $$p = 0.5$$ for the second quartile, and $$p = 0.75$$ for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is monotonically increasing.