User:Jx2022/Box plot/Shg7D1 Peer Review

General info

 * Whose work are you reviewing?

Jx2022


 * Link to draft you're reviewing
 * Link to the current version of the article (if it exists)
 * Box plot
 * Box plot

Evaluate the drafted changes
The biggest issue that I see in this article is the confusion over what the whiskers represent in the general explanation of a box and whisker plot. The explanation gives several conventions, e.g. using the absolute minima and maxima, or using the smallest and largest values within the IQR) but it does not explain which conventions are most common, or even clarify that it is talking about two different conventions. There is also no citation for the first interpretation to support the claim that this is a widely recognized convention for drawing boxplots. A citation should be added, or (perhaps with the support of other editors on the Talk page for this article) remove this interpretation altogether. Here are a few points to look at:


 * “Box plots received their name from the box in the middle, and from the plot that they are.” This last part is completely circular and needs clarification or to be removed
 * “The lowest point is the minimum of the data set and the highest point is the maximum of the data set” is misleading b/c of outliers
 * Figure 3 could benefit from a more precise caption (assuming that figures 2 and 3 are kept as is)
 * If using the absolute minimum and maximum for the whiskers is a convention sometimes but rarely used, it may be better off in the section for variations on the box and whisker plot
 * The article mentions that "Some box plots include an additional character to represent the mean." The article might benefit from an example diagram of what this looks like.
 * There is some disconnect in the examples subsection and Elements subsection, as the examples section uses the convention of calculating the min and max to be the smallest/largest data points within the IQR range, but the Elements subsection simply lists this as a type of box plot and makes no mention of which convention should be preferred. Depending on how common these conventions are, either the Elements section should state that the convention of using IQR is more common or the default, or the examples section should clarify which convention they are using.

Additionally, several statements in the "Elements" subsection tell the reader which convention they should use (e.g. “Any data not in the whiskers should be plotted using a dot…”). Without a citation, these statements are just matters of opinion, and wikipedia is meant to be a neutral source of information. These statements should probably be rephrased to say “it is conventional to” or “almost always,” and appropriate citations should be used to support these claims regarding convention (assuming that they are true).

The other shortcomings that I see are just general edits for grammar and clarity. Here are some points to consider:


 * The last line of the introduction reads “Box plots received their name ... from the plot that they are.” This is completely circular and should be clarified or removed.
 * Outlier is never defined in the “Elements” section, but min and max both use the term “outlier” in their definitions. It would be helpful to give a brief description/definition of what an outlier is somewhere near the definition of min and max.
 * Examples without Outliers uses the phrase, “this means that there are exactly 50% of elements less than the median," which can be more clearly stated as “this means that exactly 50% of elements are less than the median” or something similar. This wording is used again when defining quartiles.
 * In the section for the general equation to compute empirical quantiles, it would be helpful to explain that the notation $$x_{(k)}$$ is an ordering of the data points (i.e. if $$i < j$$ then $$x_{(i)} < x_{(j)}$$).