User:HB-JOY/sandbox

A series of hourly temperatures were measured throughout the day in degrees Fahrenheit. The ordered set is: 57, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81.

A box plot of the data can be generated by calculating the five relevant values: minimum, maximum, median, first quartile, and third quartile.

The minimum is the smallest number of the set. In this case, the minimum day temperature is 57°F.

The maximum is the largest number of the set. In this case, the maximum day temperature is 81°F.

The median is the "middle" number of the ordered set. This means that there are exactly 50% of the elements less than the median and 50% of the elements greater than the median. The median of this ordered set is 70°F.

The first quartile value is the number that marks one quarter of the ordered set. In other words, there are exactly 25% of the elements that are less than the first quartile and exactly 75% of the elements that are greater. The first quartile value can easily by determined by finding the "middle" number between the minimum and the median. For the hourly temperatures, the "middle" number between 57°F and 70°F is 66°F.

The third quartile value is the number that marks three quarters of the ordered set. In other words, there are exactly 75% of the elements that are less than the first quartile and 25% of the elements that are greater. The third quartile value can be easily determined by finding the "middle" number between the median and the maximum. For the hourly temperatures, the "middle" number between 70°F and 81°F is 75°F.

The interquartile range, or IQR, can be calculated:

$$IQR = Q3 - Q1 = q_n(0.75) - q_n(0.25)=75^\circ F-66^\circ F=9^\circ F$$

Hence, $$1.5IQR=1.5*9^\circ F=13.5 ^\circ F$$.

1.5IQR above the third quartile is:

$$Q3+1.5IQR=75^\circ F+13.5^\circ F=88.5^\circ F$$.

1.5IQR below the first quartile is:

$$Q1-1.5IQR=66^\circ F-13.5^\circ F=52.5^\circ F$$.

The upper whisker of the box plot is the smaller of two numbers: the maximum or 1.5IQR above the third quartile. Here, the maximum is 81°F and 1.5IQR above the third quartile is 88.5°F. Therefore, the upper whisker is drawn at the value of the maximum, 81°F.

Similarly, the lower whisker of the box plot is the greater of two numbers: the minimum or 1.5IQR below the first quartile. Here, the minimum is 57°F and 1.5IQR below the first quartile is 52.5°F. Therefore, the lower whisker is drawn at the value of the minimum, 57°F.

Example with outliers
Above is an example without outliers, here is followup example with outliers:

The ordered set is: 23, 57, 57, 58, 63, 66, 66, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 90.

In this example, since only the first and the last number had be changes, the median, Q3 and Q1 remain the same.

In this case, the maximum is 90°F and 1.5IQR above the third quartile is 88.5°F. The maximun is greater than 1.5IQR plus the third quartile, so the maximun is an outlier. Therefore, the upper whisker is drawn at the value of the maximum, 88.5°F.

Similarly, the minimum is 23°F and 1.5IQR below the first quartile is 52.5°F. The minimun is smaller than 1.5 IQR minus the first quartile, so the minimun is also an outlier. Therefore, the lower whisker is drawn at the value of the minimum, 52.5°F.

Visualization:

The box plot shape will show skewness of the dataset.

When the median is in the middle of the box, and the whiskers are about the same on both sides of the box, then the distribution is symmetric.

When the median is closer to the bottom or the left to the middle of the box, and if the whisker is shorter on the lower end of the box, then the distribution is positively skewed (skewed right).

When the median is closer to the top or the right to the middle of the box, and if the whisker is shorter on the upper end of the box, then the distribution is negatively skewed (skewed left).