Talk:Analysis of variance/Archive 1

Mergers?
A variety of mergers has been proposed. Please add to the discussion at Talk:Linear regression.


 * Simplified overview of Anova http://www.statisticssolutions.com/ANOVA.htm

An automated Wikipedia link suggester has some possible wiki link suggestions for the Analysis_of_variance article:

Additionally, there are some other articles which may be able to linked to this one (also known as "backlinks"): Notes: The article text has not been changed in any way; Some of these suggestions may be wrong, some may be right. Feedback: I like it, I hate it, Please don't link to &mdash; LinkBot 11:28, 1 Dec 2004 (UTC)
 * Can link degrees of freedom: ...hbox{Error}} + SS_{\hbox{Treatments}}&lt;/math&gt; The number of degrees of freedom (abbreviated df) can be partitioned in a similar way an... (link to section)
 * In Chi-squared test, can backlink analysis of variance: ...en approximately chi-squared tests; e.g., F-tests in the analysis of variance and t-tests are likelihood-ratio tests, but the test st...
 * In Conjoint analysis (in marketing), can backlink analysis of variance: ...ike SPSS or SAS. == Analysis== The computer uses monotonic analysis of variance or linear programming techniques to create utility function...

Frankly, I don't think this page is very clear. It would be nice to add a few words about the type problems ANOVA is applied to. The second bullet describing classes ("Random-effects models assume that the data describe a hierarchy of different populations whose differences are constrained by the hierarchy.") is hard to understand for the non-expert. The meaning of SS is not explained (I guess it means "sum of squares"?). Is there anybody with a backgroud in statistics who could improve upon this article? --Agd 20:28, 9 November 2005 (UTC)

Agree that the page is not clear. Needs major revision. 212.179.209.45 01:01, 16 January 2006 (UTC)

Agreed this page lacks many anova tests. I was going to use this as a reference before I got my stats book for class, and to my dismay there is not a single formula. NormDor 11:56, 17 March 2006 (UTC)

I came here after reading the biographical entry on R. A. Fisher. I agree with the foregoing comments that this page doesn't do a good job of explaining what Analysis of variance is about, or used for, that is useful for the numerate reader who is not already familiar with the topic. 198.161.198.74 (talk) 19:40, 15 January 2010 (UTC)

"Example of one-way ANOVA"?
The sections labeled "Example of [...]" are not actually examples of the ANOVA procedure, they are examples of study designs on which analysis by ANOVA might be useful. A fine distinction, but an important one I think. Should revise either the section headings or the sections' contents.
 * I updated the language, and it should reflect what you noted. Chris53516 13:53, 15 August 2006 (UTC)

Sorry, but this page really completely useless. It's a bit like an "About" box for ANOVA. Gives you a hint about what it might be. But without any actual worked examples, showing you how to get the results you need, and then how to interpret those results, nothing useful is learned. Barfly42 15:24, 17 December 2006 (UTC)

ANOVA definition
I changed the definition of ANOVA (the first paragraph). The old definitions was not generale enough. ANOVA can be used to compare several distribution, but it is just an example of its applications. Some work should be devoted to change the definitions of fixed and random effects. Gideon Fell 10:33, 14 March 2007 (UTC)

In the examples section it says: "one-way ANOVA with repeated measures". Shouldn't it be "two way ANOVA" since the experiment of using the same subjects with repeated measurements matches the description of "two way ANOVA" from the Overview section. —Preceding unsigned comment added by 190.64.28.235 (talk) 15:21, 27 October 2009 (UTC)

Factorial ANOVA
According to "Statistics for experimenters" by Box, Hunter and Hunter the use of ANOVA makes no sense for factorial experiments (section 5.10 "Misuse of the ANOVA for 2^k factorial experiments"). This appears to be because each combination of factors only has a single degree of freedom, so the value actually calculated is equivalent to referral to a t table. This seems to be a common misconception. —The preceding unsigned comment was added by 129.215.37.22 (talk) 20:46, 14 March 2007 (UTC).


 * Please provide a precise reference, and exact statement, since this cannot be correct.
 * It is standard statistical practice, recommended by Box Hunter Hunter (following Fisher, for example), to have replicates! With replicates, some error degrees of freedom exist. (Perhaps they were referring to extremely expensive experiments, with no replications?) Kiefer.Wolfowitz (talk) 21:14, 19 January 2010 (UTC)


 * I also don't see how this could be correct. The relationship between the t-test and the F-test in groups of size 2 is clear, but, for example, relying only on a sequence of t-tests ignores the potential for interactions.  I don't have access to the reference, but this confusing (and unsubstantiated) claim has been around for more than half a year; perhaps it should be removed?  Pools.KarmaHorn (talk) 18:28, 2 March 2010 (UTC)

There is a link to "Factorial ANOVA" in the 'Overview' section, which links back to the ANOVA page(this same article!). This is stupid. Either the link should be removed, or a "Factorial ANOVA" page should be added, or it should be an anchor going to a "Factorial ANOVA" section. —Preceding unsigned comment added by Mr.maddamsetti (talk • contribs) 20:10, 23 March 2010 (UTC)

ANOVA table
Could do with at least one example of an ANOVA table here, either with numbers in or showing notation for sums of squares, mean squares etc. Maybe also worth mentioning skeleton ANOVA tables, i.e. showing with entries only for df. I may add these myself at some point... Qwfp (talk) 18:18, 19 January 2008 (UTC)

ANOVA and visualization
it might be well worth the effort to add a section describing visualization techniques of ANOVA. through plots such as boxplot and others. I am willing to have a go at it, but I don't know who is responsible to this article and don't want to step anyone's tows... (p.s: I am currently doing my second degree in biostatistics) Talgalili (talk) 11:49, 1 December 2008 (UTC)


 * I think this could be a very good idea (if the visualization is appropriate of course) Bgeelhoed (talk) 09:57, 4 December 2008 (UTC)
 * Seconded. Finereach (talk) 11:01, 4 January 2009 (UTC)

Inconsistent notation
In the article there are four different notations, which unnecessarily confuse the reader:

$$\begin{matrix} SS_{treatment} & SS_{treat}\\ SSTR & SS_{Treatments}\\ \end{matrix} \,$$

Which notation should Wikipedia / the editors opt for?

Might I suggest: $$\begin{matrix} SS_{A} & SS_{T} & SS_{E} \end{matrix} $$

where

A is treatment (factor A)

T is total

E is error

What are your thoughts on this?

Ostracon (talk) 15:57, 16 August 2009 (UTC)


 * I don't know enough to comment on that, but the article should also link to or provide definitions for what the different sum-of-squares sums mean. Someone who doesn't have any idea of what SS_{treatment} means, say, should be able to get a definition within at most a click. --24.17.142.210 (talk) 17:55, 1 July 2010 (UTC)

Assumptions section has errors
ANOVA assumes neither nor
 * independence, because the randomization distribution has a covariance-symmetric (CS) covariance matrix with a small negative correlation between different replications (see Chapter 2 Section 14 of Bailey or Chapter 6 of Hinkelmann and Kempthorne)
 * normality (same reason, the randomization distribution allows inference, as emphasized by Charles S. Peirce and Ronald A. Fisher.

What is true is that the p-values of the randomization test of the ANOVA null-hypothesis are well approximated by the p-values of the F test using the F-distribution (Chapter 6, Hinkelmann & Kempthorne).

(Of course, it is easier to teach the mechanics of ANOVA testing by assuming a so-called "normal" linear model and using the F-distribution.)

Therefore, the "Assumptions" sections needs revision, imho. Kiefer.Wolfowitz (talk) 17:10, 24 November 2009 (UTC)


 * Seeing no objections, I wrote a short discussion of randomized experiments and anova. Kiefer.Wolfowitz (talk) 17:57, 4 January 2010 (UTC)


 * When I have more energy, I shall plan to put the "textbook normal-model" approach first and then follow with the randomization (design-based) approach second. I think that this will be easier for non-statisticians, particularly since the randomization-based analysis is explained in textbooks but not apparently in Wikipeda. Kiefer.Wolfowitz (talk) 00:49, 5 January 2010 (UTC)


 * I reordered the two "approaches" (models first, then randomization tests). It would be useful to mention permutation tests as good (See Lehmann's TSH, for a theorem recommended by Paul Rosenbaum, which I don't have in front of me) for data coming from possibly non-randomized sources. Kiefer.Wolfowitz (talk) 15:26, 17 January 2010 (UTC)

Attributing work on the Anova on Ranks
There is a well known saying in mathematics and statistics: "A mathematician is only given credit for his discoveries that his colleagues agree to give him." Quoting an expository article by Seaman a decade after the discovery work of Sawilowsky on the rank transform is not only unfair, but is typical of the shallow scholarship that is becoming legendary on Wiki. So, I've decided to jump right in and set the record straight, although by now I know that scholarship and Wiki warriors rarely peacefully coexist.

I also took the opportunity to move the references misplaced in the middle of the article to the end, and put them in alpha order. —Preceding unsigned comment added by 141.217.105.21 (talk) 15:09, 4 January 2010 (UTC)


 * Two comments, on substance and style.


 * First, on substance: Rank methods were used long ago for ANOVA---e.g., by H. B. Mann, Kruskal Wallis, Milton Friedman. This article should not attribute such methods "first in 1981" to Connover & Iman!
 * Is it wise to cite so many articles of Sawilowsky, while neglecting Lehmann, Hodges, Hajek, etc.? IMHO, it is generally better to direct Wikipedia readers first to standard books by researchers (roughly in order of increasing difficulty):
 * Hollander, Wolfe. Nonparametric Statistical Methods. (Reliable cookbook).
 * Hettsmansperger & McKean. Robust Nonparametric Statistical Methods.
 * Erich Lehmann. Nonparametrics: Statistical Methods Based on Ranks.
 * Hajek, Sidak, Sen. Theory of Rank Tests. ??
 * P.K. Sen, Madan Puri. varia.??
 * I don't find Seaman listed in the index in Hollander & Wolfe, Hajek, or Hettsmansperger. Hettsmansperger & McKean have nice thiings to say about Sawiloskiy's 1989-1990 work (e.g. p. 204). Unpublished simulation studies should rarely or never be cited as "proving" things, when there are mathematical proofs of asymptotic properties of tests (coupled with published studies of serious simulation studies in good journals, which are needed for finite-sample behavior). (I have heard that Conover & Iman is a good textbook.)


 * Second comment, on style. Please refrain from making disparaging remarks about the Wikipedia project and the "scholarship" of the editors associated with this page. Please continue to suggest improvements or discuss problems on the Talk page here; please direct criticisms of Wikipedia to the appropriate fora elsewhere. Such contributions are most welcome.


 * Thank you for your consideration.Kiefer.Wolfowitz (talk) 16:28, 4 January 2010 (UTC)


 * I think what would be helpful is if you took a look at the Conover and Iman 1981 "Bridge" article. They go to great lengths to differentiate the different types of ranking procedures. Although there are some rank transformations that result in a known statistic, e.g., Wilcoxon Rank Sum, there are many others that do not result in a known statistic, and chief among them for purposes of the current discussion is the ANOVA. With regard to the rank transform, Iman (Conover's student), was the first to examine it, and it took place in the mid-1970s. Iman went on to become the President of the American Statistical Association.


 * In exactly the wiki warrior spirit I mentioned above, feel free to eliminate any or all references to Sawilowsky - after all, until now his work wasn't mentioned! However, after you read the famous "Bridge" article, you will discover that Lehmann, Hodges, Hajek, et al., although world-reknown and major contributors to nonparametrics in general, and ranking (and aligned ranks) procedures in particular, had zero contributions to the hundreds of articles on the rank transform on ANOVA in general, and in terms of interaction effects in particular.


 * I'm not sure which "unpublished" simulation studies (plural) you are referring to. The first study to show contrary results to Iman and Conover was Sawilowsky's dissertation; however, the primary results were subsequently published in (1) Communications in Statistics (1987), (2) Journal of Educational Statistics (1989), and (3) Review of Educational Research (1990). The first is a standard stat journal, JES is the premier stat journal of the American Educational Research Assocation, and RER is the premier synthesis journal in the social and behavioral sciences. If an encyclopedia has any interest in chronology, the disseration (1985) could be mentioned.


 * I added G. L. Thompson's asymptotic article that begins with the inadmissablity of interactions on the RT ANOVA. She, among others, subsequently published numerous asymptotic studies confirming Sawilowsky's MC results. Thompson (JASA, 1991, p. 410) attributed the discovery of the failure of the rank transform in ANOVA to Sawilowsky's (et al.) Monte Carlo work, as did Akritas, JASA, 1991, p. 410), both calling him and his colleagues "careful data analysts". 141.217.105.21 (talk) 18:36, 5 January 2010 (UTC)


 * I thank the other editors for responding to discussion here and in particular making thoughtful changes to the article, which clarify things greatly.
 * That said, I do think that the recent edits have rendered this article too negative towards standard methods of rank-transformations. For example, in 2004, the review journal Statistical Science (of the Institute for Mathematical Statistics) had a special issue on nonparametric statistics, in which many authors discusses rank-based methods. Because of the previously mentioned textbooks and such review essays, I do believe that this article should have a first (positive) paragraph about the usefulness of rank-based methods as a general heuristic, whose properties seem to work best in simple designs.
 * Then let us keep a caution that for complicated designs (e.g. factorial-treatment designs), some caution should be exercised (as the other editors' have documented, at least to my satisfaction); here, let us keep the (updated and improved) text currently available.
 * Would that be acceptable?
 * Finally, please refrain from labeling me as a "wiki warrior", for actions I've never committed in editing articles (or suggested on a Talk page even), e.g., removing references to Sakilowsky. (On the contrary, I referenced a favorable discusssion of Sakilowsky's work!)
 * Thank you. Kiefer.Wolfowitz (talk) 23:22, 6 January 2010 (UTC)
 * Your labeling of other editors as "wiki warriors" is particularly inappropriate given your recent writing of an article defining wiki warriors: Each of your definitions directly attacks the intentions of the editor, not behavior, and is therefore against the Wikipedia policy of "assuming good faith". Kiefer.Wolfowitz (talk) 23:50, 6 January 2010 (UTC)


 * I went to the Sawilowsky page. I don't know why you have mispelled the name (I have a name that is hard to spell so I'm sensitive to it), nor do I know why you self-plagiarized en.wikipedia.org/wiki/Plagiarism your comments  by repeating them here. I agreed with your point there (as will probably most) and agree with it here, but you really don't need to promote your point on multiple pages.


 * As to the substance, my reading is that the ranking procedure is known to work for some famous stats and there was nothing negative to report about them. To support your suggestion, can more of the famous stats where it works can be cited? But it bothers me that only for one type of t test and not the other does it work, and even I don't call the t a complicated test! Does it work with multiple regression? So, I don't quite yet understand what is so favorable about this as a procedure that it should be addressed as positive general heuristic. Can you explain?


 * I think other technical experts (which I am not) should weigh in on this.68.43.236.244 (talk) 04:50, 7 January 2010 (UTC)


 * Hi Kiefer.Wolfowitz. Sorry for this addition, and being long winded. I went back to the ANOVA page and now I really don't get your concern. There are three paragraphs on the ranking, of which the first two are 100% positive. (I would say it is too positive! It is only the third paragraph on the subject that talks about it not working. Now it raises the legitimate question of why that material in the 3rd paragraph should be hidden? I for sure would want to know if after working for an hour to get the answer if my stats really work or not!68.43.236.244 (talk) 05:00, 7 January 2010 (UTC)

Factorial experiments: Rank and anova
Hettmansperger and McKean's book "Robust Nonparametric Statistical Methods"

state that the "R transform" works well, even for small samples (McKean and Seavers), pages 254-258. Distinguishing the R-transform from the Rank-transform is difficult for the public. Maybe the article should discuss the R-transform (following Hetmansperger & McKean, the best authority known to this amateur) first and foremost. Then the article can continue to discuss the rank-transform, and mention that its use seems to be deprecated (for some time). Would that be agreeable? Sincerely, Kiefer.Wolfowitz (talk) 21:35, 19 January 2010 (UTC)


 * The aligned ranks procedure of H&M is marginally better than the Blair-Sawilowsky Fawcett-Salter (only because the latter has minor Type I inflations (e.g., nominal alpha = .05 in some cases may lead to a rise in Type I error to .055, which also likely accounts for why it is more powerful than the former) (Headrick, T., & Sawilowsky, S., January, 1999, The best test for interaction in factorial ANOVA and ANCOVA. Statistics Symposium on Selected Topics in Nonparametric Statistics. Gainesville, FL). Until there are algorithms available in statistical packages, H&M remains difficult to compute.
 * The main point is that to do as you propose it will require a major rewrite, because align procedures are often complicated to conduct, and indeed have no relation to the pure rank transform, of which the named npar statistics are equivalent (e.g., Spearman's rho, Wilcoxon Rank-Sum/Mann-Whitney U, etc.)
 * Furthermore, the jury is still out on aligned ranks methods (including H&M, Puri and Sen, etc.), with many layouts yet studied and shown to be valid.Edstat (talk) 18:12, 31 January 2010 (UTC)


 * Thanks for your helpful and informative answer, which satisfies me that the status quo is good enough. Kiefer.Wolfowitz (talk) 20:51, 31 January 2010 (UTC)

SAS recommendations
I looked in the recent SAS Linear Models and Mixed Linear Models books, and they contain no references to rank (as far as I can see). Would an editor please either provide a current reference or please delete/modify the statements about SAS? Again, Statistical Science in 2004 had a lot of papers on rank-based procedurs in its special issue on nonparametrics´, so it doesn't seem useful to include a reference to rank-based methods in the 1980s. Kiefer.Wolfowitz (talk) 05:04, 14 January 2010 (UTC)


 * I have no problem if some editor wants to review ALL SAS documentation to see if they have rescinded their recommendation. However, hiding history is decidedly un-encyclopedic. The point in the current text is authors in prestigious statistical outlets (JASA, AS, etc.) recommended this procedure, and software companies (SAS is an "e.g.") followed suit. This caused untold destruction in the analysis of data, for those who know what a Type I error of 1 means. I can't imagine why someone would want this to happen again!


 * Having a lot of papers on rank-based procedures that don't address the specific issue at hand moots a 2004 date vs. 1980s. I raise the question: what is the desire (bolding a subtitle?) to cover up the history of the failure of this statistic?141.217.105.193 (talk) 13:11, 15 January 2010 (UTC)


 * Well, well, well. In the SAS/STAT 9.2 user's guide, 2008, p. 291. Intro to nonparametric analysis,: Many nonparametric methods analyze the ranks of a variable rather than the original values. Procedures such as PROC NPAR1WAY calculate the ranks for you and then perform appropriate nonparametric tests. However, there are some situations in which you use a procedure such as PROC RANK to calculate ranks and then use another procedure to perform the appropriate test. See the section “Obtaining Ranks” on page 297 for details.


 * And from page 297: "The primary procedure for obtaining ranks is the RANK procedure in Base SAS software. Note that the PRINQUAL and TRANSREG procedures also provide rank transformations. With all three of these procedures, you can create an output data set and use it as input to another SAS/STAT procedure or to the IML procedure. For more information, see the chapter “The RANK Procedure” in the Base SAS Procedures Guide. Also see Chapter 70, “The PRINQUAL Procedure,” and Chapter 90, “The TRANSREG Procedure. In addition, you can specify SCORES=RANK in the TABLES statement in the FREQ procedure. PROC FREQ then uses ranks to perform the analyses requested and generates nonparametric analyses. For more discussion of the rank transform, see Iman and Conover (1979); Conover and Iman (1981); Hora and Conover (1984); Iman, Hora, and Conover (1984); Hora and Iman (1988); and Iman (1988)."


 * So it seems to me that SAS originally made their recommendation in 1985 and 1987 based on recommendations on such fine publications as JASA and the AS, and continue to do so in 2008! Hmmm. I wonder which of SAS's quoted papers were reviewed in Mathematical Reviews and are listed on MathSciNet? 141.217.105.193 (talk) 13:30, 15 January 2010 (UTC)


 * My point was that SAS's complementary documentation for its modules on linear models --- for Linear Models and for Linear Mixed Models --- doesn't seem to mention rank-transformations. Our article is on Anova, not on nonparametrics.
 * I thank the other editor(s) for updating the reference about SAS's module on nonparametric methods, which continues to recommend the rank-transform.
 * Regarding R-transforms which are rank-based methods (but not rank transforms), see above. Such methods were featured not only in Statistical Science 2004 but also in the nonparametric/robust article(s) in JASA 2000's series of short surveys of statistics. Kiefer.Wolfowitz (talk) 15:22, 17 January 2010 (UTC) (UPDATED to distinguish rank-tranforms from R-transforms. Kiefer.Wolfowitz (talk) 16:26, 20 January 2010 (UTC) )

Effect size measures section
This section is confusing two different effect size measures when it states the following:

"The generally-accepted regression benchmark for effect size comes from (Cohen, 1992; 1988): 0.20 is a minimal solution (but significant in social science research); 0.50 is a medium effect; anything equal to or greater than 0.80 is a large effect size (Keppel & Wickens, 2004; Cohen, 1992)."

"Nevertheless, alternative rules of thumb have emerged in certain disciplines: Small = 0.01; medium = 0.06; large = 0.14 (Kittler, Menard & Phillips, 2007)."

The first paragraph refers to rule of thumb guidelines for categorizing Cohen's d. The second paragraph refers to rule of thumb guidelines for categorizing eta-squared.

68.54.107.114 (talk) 02:17, 11 January 2010 (UTC)AmateurStatistician

One vs Two -way vs Multivariate ANOVA
Generally there seems to be a bit of a disorder in ANOVA information. There is a One-way ANOVA page on wikipedia but the two way ANOVA page redirects here without giving any reasonable comparison or differentiation between the two. Since this is one of the most widely used tests in social sciences it should be clear what are the distinctions in clear and simple terms. JakubHampl (talk) 17:31, 17 April 2010 (UTC)
 * I agreeTalgalili (talk) 17:53, 17 April 2010 (UTC)

"Due to"
Perhaps something should be said about ANOVAs not always having explanatory power wrt causality (in observational studies). This is perhaps most controversial in heritability estimates, particularly in human subjects. From

However, the language that surrounds the partitioning of variance is prone to misunderstanding in its own right (Lewontin, 1974; Kempthorne, 1978), therefore I avoid using terms such as ‘due to’ or ‘caused by’ when referring to the statistical relations between an independent variable and a dependent variable (e.g., in an analysis of variance [ANOVA]), but instead use terms such as ‘associated with’ to avoid deterministic implications.

The papers cited which go into more detail on this are: Lewontin and Kempthorne. Note that "due to" is used here right in the lead. Tijfo098 (talk) 05:06, 26 October 2010 (UTC)
 * I've changed 'due' to 'attributable' in the lead as a start. Qwfp (talk) 08:00, 26 October 2010 (UTC)


 * This seems like an sound improvement. Thanks for alerting us to pay more attention to our use of "due to". Best regards, Kiefer.Wolfowitz (talk) 17:29, 26 October 2010 (UTC)


 * I'm not sure it's an improvement, since a reasonable person might think "attributable to" means "caused by". But certainly "due to" is a commonplace usage, and potentially misleading for the reason Kempthorne mentions. Michael Hardy (talk) 20:47, 26 October 2010 (UTC)

Sawilowski & students on univariate rank transformation
Following the anonymous editor's concerns, I removed this section, but include it here for archival purposes and to facilitate its use in a stand-alone article:

Quotation: ANOVA on ranks
When the data do not meet the assumptions of normality, the suggestion has arisen to replace each original data value by its rank (from 1 for the smallest to N for the largest), then run a standard ANOVA calculation on the rank-transformed data. Conover and Iman (1981) provided a review of the four main types of rank transformations. Commercial statistical software packages (e.g., SAS, 1985, 1987, 2008) followed with recommendations to data analysts to run their data sets through a ranking procedure (e.g., PROC RANK) prior to conducting standard analyses using parametric procedures.

This rank-based procedure has been recommended as being robust to non-normal errors, resistant to outliers, and highly efficient for many distributions. It may result in a known statistic (e.g., Wilcoxon Rank-Sum / Mann-Whitney U), and indeed provide the desired robustness and increased statistical power that is sought. For example, Monte Carlo studies have shown that the rank transformation in the two independent samples t test layout can be successfully extended to the one-way independent samples ANOVA, as well as the two independent samples multivariate Hotelling's T2 layouts (Nanna, 2002).

Conducting factorial ANOVA on the ranks of original scores has also been suggested (Conover & Iman, 1976, Iman, 1974, and Iman & Conover, 1976). However, Monte Carlo studies by Sawilowsky (1985a; 1989 et al.; 1990) and Blair, Sawilowsky, and Higgins (1987), and subsequent asymptotic studies (e.g. Thompson & Ammann, 1989; "there exist values for the main effects such that, under the null hypothesis of no interaction, the expected value of the rank transform test statistic goes to infinity as the sample size increases," Thompson, 1991, p. 697), found that the rank transformation is inappropriate for testing interaction effects in a 4x3 and a 2x2x2 factorial design. As the number of effects (i.e., main, interaction) become non-null, and as the magnitude of the non-null effects increase, there is an increase in Type I error, resulting in a complete failure of the statistic with as high as a 100% probability of making a false positive decision. Similarly, Blair and Higgins (1985) found that the rank transformation increasingly fails in the two dependent samples layout as the correlation between pretest and posttest scores increase. Headrick (1997) discovered the Type I error rate problem was exacerbated in the context of Analysis of Covariance, particularly as the correlation between the covariate and the dependent variable increased. For a review of the properties of the rank transformation in designed experiments see Sawilowsky (2000).

A variant of rank-transformation is 'quantile normalization' in which a further transformation is applied to the ranks such that the resulting values have some defined distribution (often a normal distribution with a specified mean and variance). Further analyses of quantile-normalized data may then assume that distribution to compute significance values. However, two specific types of secondary transformations, the random normal scores and expected normal scores transformation, have been shown to greatly inflate Type I errors and severely reduce statistical power (Sawilowsky, 1985a, 1985b).

According to Hettmansperger and McKean "Sawilowsky (1990) provides an excellent review of nonparametric approaches to testing for interaction" in ANOVA.

Supporting references
I believe that most of these books and articles are related to Sawilowski's publications or unpublished writings, and were added in excellent faith by Edstat, I add in good faith (having just removed many references that were zealously added by me, when I was evangelizing for generalized randomized block designs!). I'll come back and look for references to them in other sections. Again, they would be very useful in an article about academics closely associated with Sawilowski (not necessarily on Wikipedia) or in a stand alone article on rank-transforms, if that is a notable topic (e.g. is it covered in statistical encyclopedias or recent surveys in notable reliable journals?). Thanks Kiefer.Wolfowitz (talk) 19:43, 3 November 2010 (UTC)


 * Ferguson, George A., Takane, Yoshio. (2005). "Statistical Analysis in Psychology and Education", Sixth Edition. Montréal, Quebec: McGraw–Hill Ryerson Limited.
 * Headrick, T. C. (1997). Type I error and power of the rank transform analysis of covariance (ANCOVA) in a 3 x 4 factorial layout. Unpublished doctoral disseration, University of South Florida.
 * Helsel, D. R., & Hirsch, R. M. (2002). Statistical Methods in Water Resources: Techniques of Water Resourses Investigations, Book 4, chapter A3. U.S. Geological Survey. 522 pages.
 * Iman, R. L., & Conover, W. J. (1976). A comparison of several rank tests for the two-way layout (SAND76-0631). Alburquerque, NM: Sandia Laboratories.
 * King, Bruce M., Minium, Edward W. (2003). Statistical Reasoning in Psychology and Education, Fourth Edition. Hoboken, New Jersey: John Wiley & Sons, Inc. ISBN 0-471-21187-7
 * Keppel, G. & Wickens, T.D. (2004). Design and analysis: A researcher's handbook (4th ed.). Upper Saddle River, NJ: Pearson Prentice–Hall.
 * SAS Institute. (1985). SAS/stat guide for personal computers (5th ed.). Cary, NC: Author.
 * SAS Institute. (1987). SAS/stat guide for personal computers (6th ed.). Cary, NC: Author.
 * SAS Institute. (2008). SAS/STAT 9.2 User's guide: Introduction to Nonparametric Analysis. Cary, NC. Author.
 * Sawilowsky, S. (1985a). Robust and power analysis of the 2x2x2 ANOVA, rank transformation, random normal scores, and expected normal scores transformation tests. Unpublished doctoral dissertation, University of South Florida.
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * Keppel, G. & Wickens, T.D. (2004). Design and analysis: A researcher's handbook (4th ed.). Upper Saddle River, NJ: Pearson Prentice–Hall.
 * SAS Institute. (1985). SAS/stat guide for personal computers (5th ed.). Cary, NC: Author.
 * SAS Institute. (1987). SAS/stat guide for personal computers (6th ed.). Cary, NC: Author.
 * SAS Institute. (2008). SAS/STAT 9.2 User's guide: Introduction to Nonparametric Analysis. Cary, NC. Author.
 * Sawilowsky, S. (1985a). Robust and power analysis of the 2x2x2 ANOVA, rank transformation, random normal scores, and expected normal scores transformation tests. Unpublished doctoral dissertation, University of South Florida.
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * SAS Institute. (2008). SAS/STAT 9.2 User's guide: Introduction to Nonparametric Analysis. Cary, NC. Author.
 * Sawilowsky, S. (1985a). Robust and power analysis of the 2x2x2 ANOVA, rank transformation, random normal scores, and expected normal scores transformation tests. Unpublished doctoral dissertation, University of South Florida.
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:
 * Strang, K.D. (2009). Using recursive regression to explore nonlinear relationships and interactions: A tutorial applied to a multicultural education study. Practical Assessment, Research & Evaluation, 14(3), 1–13. Retrieved 1 June 2009 from:

I note that Hettsmansberger and McKean is notable and reliable, given the writers' being asked to be head editors of e.g. the Statistical Science special issue on nonparametrics or to write the JASA 2000 article reviewing nonparametrics and robust statistics. (I am happy that, as first noted in the article on Sawilowski, that H & McK have nice comments in a few pages about Professor Sawilowski.) I don't see why the other articles should stay in an article on Anova here, unless they are cited by reliable books on ANOVA. Thanks, Kiefer.Wolfowitz (talk) 19:47, 3 November 2010 (UTC)

Discussion (of Univariate Rank transformation)
Let the discussion begin! Kiefer.Wolfowitz (talk) 19:32, 3 November 2010 (UTC)

Examples
This example has no randomized assignment of treatment to subjects. It seems that group-status is perfectly confounded with treatment, so this is a worthless "experiment". Kiefer.Wolfowitz (talk) 20:48, 3 November 2010 (UTC)

Example removed
In a first experiment, Group A is given vodka, Group B is given gin, and Group C is given a placebo. All groups are then tested with a memory task. A one-way ANOVA can be used to assess the effect of the various treatments (that is, the vodka, gin, and placebo).

In a second experiment, Group A is given vodka and tested on a memory task. The same group is allowed a rest period of five days and then the experiment is repeated with gin. The procedure is repeated using a placebo. A one-way ANOVA with repeated measures can be used to assess the effect of the vodka versus the impact of the placebo.

In a third experiment testing the effects of expectations, subjects are randomly assigned to four groups:


 * 1) expect vodka—receive vodka
 * 2) expect vodka—receive placebo
 * 3) expect placebo—receive vodka
 * 4) expect placebo—receive placebo (the last group is used as the control group)

Each group is then tested on a memory task. The advantage of this design is that multiple variables can be tested at the same time instead of running two different experiments. Also, the experiment can determine whether one variable affects the other variable (known as interaction effects). A factorial ANOVA (2×2) can be used to assess the effect of expecting vodka or the placebo and the actual reception of either.

Euclidean geometry
In a balanced design, the factors's induce an orthogonal decomposition of a Euclidean space; and the converse holds (see Bailey). First project the data onto the mean-value subspace, and then consider that subspace's orthogonal complement, which then needs be intersected with the subspaces of treatment & block subspaces (which may have further decompositions). The squared Euclidean norm of the projected residuals is the sum of squares. The degrees of freedom are the dimensions of the subspace.

With this orthogonality (orthomodularity), the sums of squares add nicely, regardless of any normality of the residuals.

This geometric account of Anova is given in friendlier fashion in Bailey, in Christensen, and in the very friendly Saville & Woods (in 2 volumes) for example. It should be given here. Kiefer.Wolfowitz (talk) 21:11, 3 November 2010 (UTC)

Careful review suggested
This article suffers from obtuse pedagogy (it's essentially useless) to downright inaccurate information about ANOVA, its assumptions, and its small sample robustness and power properties. (The ANOVA F test of difference in means is robust to departures from independence, homoscedasticity, and/or normality? Tell that to the hundreds of Monte Carlo studies published since 1980!) A thorough reading of the Monte Carlo literature after 1980 would benefit this article greatly. My suggestion is that the current editors step back and ask for some help, preferably not from the asymptotic maths lobby, but from qualified applied statisticians who have read the literature post 1980 (but for starters, read Glass, Peckham, & Sanders, 1972; Bradley, 1969, 1972, etc.; Blair, 1980, 1981, 1985, etc.; Sawilowsky, 1990, 1992, etc.) It's just a suggestion - don't reach for the aspirin or saltines.Edstat (talk) 03:49, 15 November 2010 (UTC)
 * WP:Be bold: If you see something that can be improved, improve it! Qwfp (talk) 07:20, 15 November 2010 (UTC)
 * Thanks, but "been there, done that, got wikified" by folks who haven't read the literature or choose to ignore it.Edstat (talk) 23:02, 15 November 2010 (UTC)
 * I rewrote the relevant sentences for greater specificity. I removed the strong claim that referenced Lindmann, because the leading researchers I cite are more guarded in their endorsements of the F-test for Anova's null hypothesis. Kiefer.Wolfowitz (talk) 18:45, 15 November 2010 (UTC)
 * The robustness that is referenced is that associated with comparing the p-values from F-tests with the p-values from randomization test of the null-hypothesis (when there has been randomized assignment, or with the permutation test when there need not be random assignment but power is desired against all alternative distributions, following Lehmann & Rosenbaum). This is the benchmark discussed in Hinkelmann & Kempthorne, the reference cited.
 * The article doesn't deny that associates of Sawilowsky have found alternatives for which their simulation studies show problems. If such studies were considered important enough to be highlighted in the leading textbooks or the most reliable surveys on ANOVA, then please write an appropriately lengthed paragraph on them. But please consider this question: Aren't the books cited a reasonable selection of the best books on ANOVA, by many standards? Sincerely, Kiefer.Wolfowitz (talk) 15:31, 15 November 2010 (UTC)
 * Thanks for starting to relook at some of the issues. The "standard" you are referring to is inappropriate. Under non-normality, the Anova test's robustness is poorer than the permutation test, and in fact one way to fix its Type I error problems is to turn it into a permutation test! As for power, the comparison is not reasonable, because the power spectrum of the ANOVA follows the power spectrum of the permutation test, whereas nonparametric alternatives are greatly higher! Moreover, under heteroscedasticity, what good is it to compare the Anova to the permutation test, when the latter is non-robust to that violation? (See, e.g., Boik). Lehmann retired before Monte Carlo studies could be conducted on a PC, Hinkelmann writes well but is not a world class researcher, and Kempthorne was a pioneer who lived prior to most of the work on the ANOVA conducted by Monte Carlo studies. There are plenty of good textbooks that discuss this, you can start with Wilcox.Edstat (talk) 23:02, 15 November 2010 (UTC)
 * EdStat, I have tried to introduce the randomization-perspective in many articles on WP, following ASA & RSS guidelines for a first course in statistics, that the distinction between (randomized) experiments and observational studies is important: Neither the ASA nor the RSS specify why this distinction matters. The answer of Peirce, early Fisher, Kempthorne, and Basu is that the randomization design allows a test of the null-hypothesis using an objective known probability distribution: These arguments leave open the choice of a statistic. Kempthorne noted that the randomization test using the F-statistic gave similar results as the F-test. All else is commentary. ;) Kiefer.Wolfowitz (talk) 14:16, 16 November 2010 (UTC)
 * Please re-read what I wrote. Permutation tests have maximum (against all alternatives) power (see Lehmann for conditions). Permutation tests need not have maximum power against some alternative(s); apparently, you refer to some simulation studies of some alternative. Kiefer.Wolfowitz (talk) 14:18, 16 November 2010 (UTC)
 * I continue to point out how poorly this entry is - just ask ANYONE who wants to learn about the method gets from reading this entry. Statements such as "Some popular designs have the following anovas:" is just downright silly, as are a dozen other statements.Edstat (talk) 20:48, 18 November 2010 (UTC)
 * Editor Qwfp invited you to contribute improvements. At the very least, please list the mistakes. Thanks, Kiefer.Wolfowitz (talk) 21:55, 18 November 2010 (UTC)
 * Qwfp, here is the latest example of what I meant. After Kiefer.Wolfowitz's stalking my other edits to delete them on other pages he has now twice deleted an important condition in ANOVA (namely, factorial ANOVA is less robust to testing oridnal interactions than disordinal interactions in the absence of population normality) - a very real, practical concern to anyone actually wanting to use ANOVA. First, I assumed good faith that he just didn't like the section it was in, so I moved it to what may be a better place. But, he then deleted it a 2nd time, this time with the caustic remark in the Edit summary: "off-topic promotion of Sawilowski again." Oh his talk page he explains his reason - why, after all he is a statistician! Thus, he has no problem with deleting a reference to this: "Underlying assumptions of factorial ANOVA...I add a fifth consideration that is nearly universally overlooked. It is most important to stress that testing for ordinal interactions (Figure 14.8) in factorial ANOVA can be more severely debilitating than test for disordinal interactions (Figure 14.7) when underlying assumptions are violated" Sawilowsky, (2007, Real Data Analysis: A volumne in quantitative methods in education and the behavioral sciences: Issues, Research, and Teaching, American Educational Research Association, Educational Statisticians, IAP:Charlotte, NC, ISBN 978-1059311-564-7.) Kiefer.Wolfowitz' opinion obviously trumps citations from that book! That is what I meant by "been there, done that, got wikified." So no, I won't be making any more edits to this page as long as bullying on this page persists, and stalking my other edits on other pages persists, along with the litany of false personal attacks Kiefer.Wolfowitz makes whenever ANY editor crosses him!Edstat (talk) 13:44, 19 November 2010 (UTC)
 * Edstat, your personal attacks do not improve your argumentation and the sympathy with which outside editors typically view complaints. (Please observe that editor Melcombe is a useful counterexample to your claim that I mistreat editors: He just deleted the Mathematical Reviews link for Pfanzagl's book, with the "useless" characterization. Please see whether he and I have ever had an edit war, even though we usually come from different perspectives, and he can be frank at times. You know that I can be frank and sometimes irritable, I assume!)
 * Edstat, Please try to consider in an article on ANOVA (not on factorial experiments) whether — even before you provide a link to the article on factorial experiments, or explain what they are (apart from saying that they can be arbitrarily complex, which holds only if the number of experimental units is infinite) — it is prudent to promote another finding by Sawilowsky, which does not appear in the most reliable references. I repeat that I removed text and references that I had earlier added, all for the part of making the article more readable (following a reader's complaint, above). Sincerely, Kiefer.Wolfowitz (talk) 14:14, 19 November 2010 (UTC)
 * Yes, I admit when I am wrong. You have not made personal attacks against ANY" editor who you disagree with, which was an exaggeration and I apologize. I must amend to say you make personal attacks and stalk several editors, of which I am one. Furthermore, it is obnoxious for you to decide you are the arbiter of what constitutes a "reliable reference". To call a publication by the American Educational Research Association, an academic organization of perhaps 90,000 Ph. D.s, not a "reliable reference" is obnoxious at the least, and a violation of wikipedia rules for certain, and you know it. However, my past experiences with you is further discussion is fruitless. Let this page continue to be a laughingstock of those who actually use ANOVA in their professional careers. Goodbye.Edstat (talk) 14:22, 19 November 2010 (UTC)
 * I have made no statements about the AERA. Kiefer.Wolfowitz (talk) 14:32, 19 November 2010 (UTC)

Heteroscedacity: Variable variance
Editor Edstat raised concerns about a non-normality, and about heteroscedacity (alternatively, differing variances, or a failure of homoscedacity!), etc. In the section on the randomization analysis, references to Cox and to Kempthorne are given to support the statement that a proper randomization procedure and unit-treatment additivity imply constant variance. Thus result explains why both Cox & Kempthorne (and Rosenbaum, Rubin, Imbens, Abadie, Angchrist, etc.) emphasize proper randomization and why they emphasize the unit-treatment additivity assumption. When this unit-treatment additivity is implausible, the analysis is more difficult (although local average unit-treatment additivity saves much of the standard analysis). While the article's few paragraphs are not a substitute for a textbook, they at least sketch the central issues, and reference the most reliable sources. Edstat's claim that normality is so important is not supported by the analysis by these authors, who are usually regarded as the most reliable sources. Kiefer.Wolfowitz (talk) 17:25, 19 November 2010 (UTC)
 * A reaction to 'Consequence of failure to meet assumption the fixed effects analysis of variance and covariance', Blair, R. C. (1981), Review of Educational Research', 51(4), 499-507. doi: 10.3102/00346543051004499. I hesitate mentioning this reference, which is one among many, given that after all, you are a statistician.Edstat (talk) 17:38, 19 November 2010 (UTC)
 * Again, the best authors base their analysis on the randomization distribution, or at least like their degrees of freedom determined by the randomization distribution, which is determined by the assignment mechanism.
 * If the study was not a randomized experiment but only an observational study, and you want to focus on specific alternatives (rather than the general class specified by Lehmann & Rosenbaum), then of course violations of "normality" matter, as you say. But such studies are so bad in general that they receive little emphasis in the anova literature in statistics.
 * Please see references in the article on statistics education for work by statisticians to help education, which is notorious for the lack of controlled experiments for evaluating teaching (Thomas Cooke): Thomas D. Cook, Randomized Experiments in Educational Policy Research: A Critical Examination of the Reasons the Educational Evaluation Community has Offered for Not Doing Them, Educational Evaluation and Policy Analysis 24 (2002), no. 3, 175-199. Kiefer.Wolfowitz (talk) 23:45, 19 November 2010 (UTC)
 * (1) I take it you never bothered to look up the reference I gave. You assumed from the journal title that it was an observational study. Neither the motivating study by Glass, Peckham, and Sander's (1972) in RER, nor Blair's RER study, is an observational study. They are both Monte Carlo studies based on mathematical distributions. I would suggest you also examine Sawilowsky's Monte carlo work on real data sets (or if you prefer, Harrel's, Serlin's, Zumbo's, Zimmerman's, Kromrey's, Ferron's, H. Kesselman's, Kesselman's, R. Ramsey's, Ramsey's, Wilcox's, Huberty's, G. Thompson's, Higgin's, J. Bradley's, Beretvas', Dayton's, de Leeuw's, Feng's, Hambleton's, Huck's, Kirk's, J. Levin's, Lix's, Lomax's, Micceri's, Onghena's, F. Schmidt's, Singer's, S. Weinberg's, S. Wise's, Mawxwell's, Toothaker's, Grissom's, Peng's, Becker's, Appelbaum's, Beasley's, and 100's of others' Monte Carlo work if prefer because you think citing peer-reviewed literature is "promoting" Sawilowsky) except its obvious you are oblivious to the entire genre of literature, and don't bother to look up citations even when you ask for them. (2) "best authors"? - your bias is showing again! Tell me, "statistician", which wikipage has so-designated the "best authors"? (3) From your comment on the ANOVA on ranks page, it is obvious that you really have no idea what randomization of subjects is all about. (4) "randomization distribution", when the topic is layout? Are you reading what you are writing?(5) You quote "Cook", which was outdated before he even tried to update Campbell and Stanley (1963)? My suggestion, "statistician" - why not actually go to a library and read up a bit on the discipline? Wikipedia deserves you, and you deserve wikipedia! I've already stated I won't edit any page you are working on. I'll also no longer respond to any of your "discussion" page diatribes either. So you, and your editing cabal, will get the last word!Edstat (talk) 03:43, 28 November 2010 (UTC)


 * As I wrote, the permutation test has optimality properties against all alternatives. You are discussing specific alternatives, for which other alternatives can have better power. Kiefer.Wolfowitz (talk) 11:51, 28 November 2010 (UTC)

Extremely unclear
After reading this article, I am still left with absolutely no idea how this technique is actually employed. There are many references to "treatments" -- is it used exclusively in medical research? A fully-worked example (including computation) would be a great boon. 121a0012 (talk) 05:59, 5 January 2011 (UTC)

Effect size section self-contradictory
Please see the following section (copied below):

"Though, considering that η2 are comparable to r2 when df of the numerator equals 1 (both measures proportion of variance accounted for), these guidelines may overestimate the size of the effect. If going by the r guidelines (0.1 is a small effect, 0.3 a medium effect and 0.5 a large effect) then the equivalent guidelines for eta-squared would be the squareroot of these, i.e. 01 is a small effect, 0.09 a medium effect and 0.25 a large effect, and these should also be applicable to eta-squared. When the df of the numerator exceeds 1, eta-squared is comparable to R-squared (Levine & Hullett, 2002)."

Note that it is self-contradictory. First it says "η2 are comparable to r2 when df of the numerator equals 1" and later says "When the df of the numerator exceeds 1, eta-squared is comparable to R-squared". Any suggestions on which is correct?

I also suggest that this section is removed until consensus is reached.

Trevorzink (talk) 02:28, 6 April 2011 (UTC)
 * I removed it. That section seems to be derived from a anova for psychologists approach. Kiefer .Wolfowitz 01:59, 14 March 2012 (UTC)

Cross purposes.
On this day Kiefer.Wolfowitz and I worked in mild opposition. He was removing references that I was strengthening. I will wait a few days to let the dust settle. 159.83.196.1 (talk) 01:51, 14 March 2012 (UTC)
 * I am done now. I have removed the German sources before, which may or may not be fine for German Wikipedia but which seemed useless and redundant for this article.  Kiefer .Wolfowitz 01:56, 14 March 2012 (UTC)

Speculation about References

 * Kempthorne and Cox? I cannot find any book/paper that they coauthored.
 * Bortz, J. (1999). Not sure of specific edition/year/ISBN.  Used latest.
 * Bühner, Markus & Ziegler, Matthias (2009). Not sure of specific edition/year/ISBN.  Used latest.
 * Gosset (AKA "Student") wrote 2 papers of note in 1908:
 * "The probable error of a mean". Biometrika 6 (1): 1–25. March 1908. doi:10.1093/biomet/6.1.1.
 * "Probable error of a correlation coefficient". Biometrika 6 (2/3): 302–310. September 1908. doi:10.1093/biomet/6.2-3.302.


 * Cox, David R. (1958). Updated by Cox, David R. & Reid, Nancy M. (2000)?
 * Kempthorne, Oscar (1979). Updated by Hinkelmann, Klaus & Kempthorne, Oscar (2008)?

Under-cited: Cox - mentioned often in text, but no specific references cited. Two references in list. Freedman - no specific reference cited. Two references in list. Kempthorne - often mentioned without citation. Two references in list. 159.83.196.1 (talk) 01:54, 14 March 2012 (UTC)


 * Cox's book and Kempthorne's book(s) are mentioned in the references. These books are conceptually rich. Kiefer .Wolfowitz 01:58, 14 March 2012 (UTC)


 * So classic that they are available as reprints only. I suggest citing them rarely.159.83.196.1 (talk) 01:16, 16 March 2012 (UTC)
 * H&K recently acquired a third volume. I suspect that it will be the classic reference for a few decades.  It is amazing that Cox and Kempthorne managed to be classics for two generations.  I assume that the field learned some important information from 50 years of experience and computerization
 * 159.83.196.1 (talk) 00:24, 17 March 2012 (UTC)
 * Kempthorne & Cox are cited precisely for their discussions of randomization-based analysis of randomized experiments, because their discussions continue to lead conceptual discussions of randomization (e.g. in Rubin or Rosenbaum). There are further details in the "Further reading" (Kageyama and Calinksi, Anscombe, etc.).
 * Nobody denies that algebra-powered model-based analysis has advanced: Modern discussions begin with the orthomodular lattice of the subspaces of an inner-product space (in finite dimensions and with some new wrinkles in Hilbert spaces). See Bailey for an elementary introduction, for example.  Kiefer .Wolfowitz 10:38, 17 March 2012 (UTC)
 * Thanks for the information about the "3rd volume", which seems to have neither Hinkelmann nor Kempthorne. I looked at the chapter on "defense applications", which was not of HK quality, though. I suppose that Stufken, Street, Chen, etc. do far better. Kiefer .Wolfowitz 21:53, 17 March 2012 (UTC)

Cox (1958) is an odd and interesting reference for ANOVA. Analysis of variance does not appear in the index. Page 12: "...methods of statistical analysis will not be described in this book." He talks some about models, specifically additivity (on page 15) which is applicable to this article. Is Cox (cited repeatedly) one of the best references on a subject admittedly not discussed half a century ago? I suspect a flawed citation. Cox makes frequent mention (30 references) to Cochran & Cox (1957, 2e) Experimental Designs. Page 292: "Cochran and Cox (1957) have given numerous detailed plans as well as worked numerical examples of the analysis of biological experiments." Gertrude Mary Cox might be a better reference than Sir David Roxbee Cox for ANOVA. Both books are statistical classics, available as reprints. Both authors are mentioned favorably in Design of experiments.159.83.196.1 (talk) 23:22, 1 May 2012 (UTC)


 * Oops, factual error. D. R. Cox does specifically reference ANOVA in the index.  I paid inadequate attention to the indentation style.159.83.196.1 (talk) 19:07, 25 May 2012 (UTC)


 * Where Cox(1958) is properly cited (in only 2 places), the article specifically points to Chapter 2. There are multiple other mentions of "Cox" in the article for which no proper citations are given ... so these might refer to multiple different choices of authors whose names are Cox with multiple possible books or papers for each. Similarly for "Kempthorne". This confusion illustrates the importance of providing proper citations for each important point being made. Melcombe (talk) 00:17, 2 May 2012 (UTC)

Status - April 2012
On April 4, 2012 readers of ANOVA assigned the following scores: Trustworthy 3.3 of 5;   Objective    3.3 of 5;   Complete     3.0 of 5; Well-written 2.3 of 5

Mathematicians assigned a "start" grade (less than a C). Statisticians assigned a grade of C. Both groups claimed ANOVA to be important. The currency of these grades is unknown.

The article was recently locked (no edits) for a few days over vandalism concerns.159.83.196.1 (talk) 21:04, 5 April 2012 (UTC)


 * You might note that until relatively recently it was not possible to assign a C grade in the maths template without creating problems for automatic processes based on such templates. But more seriously, given the the lack of properly formulated citations for what is being said and the generally confused state of the article, it is scarcely worth more than a "stub", despite its length. Melcombe (talk) 00:32, 2 May 2012 (UTC)

Regression
(The following, or similar, needs to be added. It's only hinted at, in the article, and all the links substantiate it. Cites, of course, can be added. 72.37.249.60 (talk) 19:06, 18 April 2012 (UTC) )

Consistent with ANOVA's purpose to analyze a variable's components attributable to different sources of variation, it is possible to view most any general linear model as a regression.


 * $$ y = b_0 + b_1 x_{1} + b_2 x_{2} + \ldots + b_p x_{p} + \epsilon, \epsilon \thicksim N(0, \sigma^2_{error})$$

where the xi, i=1,2,...,p, are quantitative variables, in some cases merely 0 or 1 representing the absence or presence, respectively, of different levels of qualitative variables, and where their multiple degrees of freedom are distributed with df 1 assigned to each level as a separate variable. From this model, consistent with standard formulas for expectation and variance,


 * $$ \mu_y = b_0 + b_1 \mu_1 + b_2 \mu_2 + \ldots + b_p \mu_p + 0,$$


 * $$ \sigma^2_y = 0 + b^2_1 \sigma^2_1 + b^2_2 \sigma^2_2 + \ldots + b^2_p \sigma^2_p + \sigma^2_{error},$$

and bi2σi2 is the variance component of y attributable to xi. The size of the estimated variance component, usually relative to mse, the estimated σ2error, determines xi's significance in the model of y. At least asymptotic normality of the estimators is a fundamental assumption, allowing F-tests or t-tests of "H0:xi's contribution is insignificant.", but there are instances when the assumption of normality is unjustified and nonparametric alternatives to ANOVA are needed.


 * A statement that asymptotic normality is necessary is false. Perhaps your source is carrying on the Fisher tradition of confusing necessity and sufficiency?
 * Please read the discussion of Kempthorne and Cox on the randomization distribution (for finite-populations). Where does asymptotic normality arise?
 * Kiefer .Wolfowitz 21:06, 18 April 2012 (UTC)


 * Without referring to the book (for which you provide no link) one can concede, from the imbedded links here that: At least asymptotic normality of the estimators is a (commonly used, practically speaking, otherwise how could indicator functions be modeled?) necessary (really it's normality, together with i.i.d., which is the sufficient) condition to allow the F-tests or t-tests, which themselves are so key to ANOVA. 72.37.249.60 (talk) 22:14, 19 April 2012 (UTC)
 * Hi IP editor!
 * So you agree that you (and perhaps the source) had continued the Fisherian tradition of confusing necessary and sufficient conditions?
 * If your college doesn't have Kempthorne or Cox, then you should move to another one. If you are not in an academic environment, go and ask your local librarian to provide inter-library loan(s).
 * It would be much easier for this page to consist of the statement: "In a Hilbert space, the closed linear subspaces form an orthomodular lattice, and so the norm of a vector has an additive decomposition in terms of subspace components," were such a statement to comply with WP:Due weight.... ;)
 * Below, Melcombe is correct that discussions of tests and distributions should be the focus of other articles, and such discussions should have a summary here (per WP:Summary style).
 * Best regards,  Kiefer .Wolfowitz 08:53, 20 April 2012 (UTC)

But "tests" are not basic to ANOVA, and neither are models involving distributions for models errors: read the label "analysis of variance". The basis is "explained variance" and a comparison of the explained variance of a sequence of models. Then for a sequence of nested models this becomes the question of how much more variance is explained by the extra complexity of one model expanded from another. Of course in general ANOVA there is not necessarily a unique way of sequencing a nested set of models. But the initial steps are getting the increase in explained variance in terms of increase in the sum of squares of the predictions and then converting these to a mean square. Basic information at an intuitive level (the relative importance of components in a model) can be gained simply by comparing the numerical values of these mean squares. All of this is at the level of "least squares" and does not involve modelling using either type of model for random components. Of course, once the "explained SS due to a model component" have been defined by least squares these can be interpreted in the context of either or both types of sources of random variation. At this stage the models provide, firstly, an extra guide as to how to interpret the mean squares (in terms of their expected values) and only then provide the possibility of formal hypothesis testing. ANOVA procedures are a tool of practical statistics, not something derived ab initio to be optimal under one pre-specified model in theoretical statistics and it would be misleading to present it as such. Melcombe (talk) 08:25, 20 April 2012 (UTC)


 * No doubt, what you say is true but more descriptive of the origins of ANOVA than of its current use. It is analogous to insisting on describing regression subset model selection by focusing on comparative R-squares (or, more analogous to MSE, Adjusted R-squares). The results are not conservative enough. Hypothesis tests give better results, especially if used together with an overall strategy like stepwise regression, Mallow's Cp, or False Discovery Rate. In scientific literature today, one never sees ANOVA results without F-tests. -72.37.249.60 (talk) 21:08, 11 May 2012 (UTC)
 * Fisher seems a widely accepted original authority; see http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_introanova_a0000000081.htm — Preceding unsigned comment added by 72.37.249.60 (talk) 19:13, 18 May 2012 (UTC)

Added introductory terminology.
This article is opaque to those innocent of the terminology of the design of experiments. Defining terminology offers some help. Many technical terms remain undefined. Editors should feel particularly free to remove terms unused or already defined in the text. 159.83.196.1 (talk) 23:31, 20 June 2012 (UTC)
 * I admit that I am conflicted about this addition. On the one hand, I agree that there is value in giving people context to an article they read.  On the other hand, many of the definitions should simply be wiki-linked from the article in order to give proper reference.  The only justification I find for keeping the context is the centrality of the ANOVA article for many non-statisticians.  I would be happy to read the opinions of other editors... Tal Galili (talk) 15:43, 23 June 2012 (UTC)
 * Yeah, I feel the same way. Dbrodbeck (talk) 18:52, 23 June 2012 (UTC)
 * Summary - Functional, but ugly. I will try to smooth the hypothesis testing terms into the text in the next few weeks.  The DoE terms will take more thought.159.83.196.1 (talk) 21:06, 28 June 2012 (UTC)
 * Ready for another round of comments. More wiki-links in my introductory paragraphs would clearly be beneficial.  My recent edits moved the more advanced material out of the Assumptions section.  I added a gentle introductory sentence leading to the post-graduate hypothesis testing material.159.83.196.1 (talk) 23:52, 3 July 2012 (UTC)

Sloppy typesetting.
While I improved the consistency of the math symbols used, the result was a rich mixture of fonts. A typesetter could substantially improve on my effort.159.83.196.1 (talk) 20:54, 7 July 2012 (UTC)


 * Improved.159.83.196.28 (talk) 17:37, 13 July 2012 (UTC)

Derived model citations.
Derived linear model section: "However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations." The statement is supported by two citations, both broken. Bailey's book does not have a section 1.14 (sorry). I found support in the equations (not the prose) of H&K. The structure of the covariance matrix equations implies that total errors are independent but that observational errors are not. This requires more explanation than the section justifies. I removed the weakly supported sentences.159.83.196.1 (talk) 23:30, 13 September 2012 (UTC)
 * See Chapter 2.14 "A More General Model" in Bailey, pp. 38-40. Kiefer  .Wolfowitz  14:43, 1 October 2012 (UTC)
 * Do you have any suggestions for the H&K citation?159.83.196.1 (talk) 22:25, 2 October 2012 (UTC)

Subscript conventions.
Texts use different conventions regarding subscripts. Howell (2002) uses i as an index into experimental units and j as an index into treatment groups. Howell's convention is used here to define additivity. Montgomery (2001) follows a long tradition of reversing the roles of i & j. Montgomery's convention is hinted at here by defining I as the number of treatments. I propose to adopt Montgomery's convention regarding subscripts throughout. Objections? 159.83.196.1 (talk) 21:34, 12 October 2012 (UTC)

Power of the noncentral F-test: Planning studies
The F-test is used in planning the experiment and the anova, because the non-centrality parameter shifts the F-distribution to the right. Using t-tests to plan experiments, as Bailey does in an otherwise fine book, results in larger numbers of subjects than needed, in many cases. This is not discussed, despite it being the main motivation. (Non-central t-distributions are less readily accessible, and don't appear in textbooks on Anova.)

03:22, 9 April 2013 (UTC)

Attention watchers.
A major enhancement, deserving expert review was made to the sister article: One-way analysis of variance159.83.196.1 (talk) 21:35, 30 October 2012 (UTC)

CV?
The weird nature of this articles formatting seems to lend to some copy and paste issue. I've addressed some of them; but this diff shows major issues between the formatting. I couldn't find any evidence of a CV myself; but the book source might be used. 


 * CV? Curriculum vitae - no.  Cardiovascular - no.  Constant velocity (joints) - no.  I suggest an explicit reference to Spotting possible copyright violations.  May your searches be few, fast, efficient and fruitless.  The lengthy Notes and References sections and numerous quotations imply that 1) little originated with the editors and 2) sources were credited.  The Design-of-experiments terms section makes extensive use of a publication of the U.S. government (www.nist.gov).  This is 1) a special case in copyright law and 2) plainly acknowledged in the text.  You are drilling a bone-dry well.159.83.196.1 (talk) 19:26, 12 July 2013 (UTC)

Two-way example gone.
The section "ANOVA for multiple factors" points to the main article Two-way analysis of variance which is now a stub. Editors do not wish Wikipedia to contain lengthy examples.159.83.196.1 (talk) 20:03, 12 July 2013 (UTC)

Explain "no fit at all"
The example in the introduction is excellent, but some elaboration is needed of the statement: "An attempt to explain the weight distribution by dividing the dog population into groups (young vs old)(short-haired vs long-haired) would probably be a failure (no fit at all)."

Someone thinking of the weight distribution as an empircal histogram can object that any histogram can be written as sum of histograms corresponding to subgroups of the population. The phrase "no fit at all" might be interpreted as a claim that blue histograms do not actually add to the yellow one.

What the sentence is trying to convey is that "success" at dividing up the population into categories means that if you are given the category of a dog, you can use the corresponding histogram to estimate the dogs weight well. Hence "no fit at all" refers to the fact that if you are given a dog is (for example) young then you can't make a good guess of the dog's weight by using the weight histogram of young dogs.

Tashiro (talk) 06:18, 22 March 2014 (UTC)

Improve or remove use of "treatment," "factor," "factor level"
The term "treatment" is apparently central in this exposition of ANOVA, it appears early in the "Background and terminology" section, but ... before it is defined!

The later definition of "treatment" in the "Design-of-experiments terms" section says it is "a combination of factor levels." What kind of combination of levels? A sum of the level numbers? Look up "factor" to find out about factor levels: a factor is an investigator-manipulated process that causes a change in output. What kind of process might this mean? Adding and removing data? Why would you do that? Output of what? Might this "output" refer to how variance changes when the investigator manipulates the data like this? How can a process have "levels?"

Please clarify. — Preceding unsigned comment added by Randallbsmith (talk • contribs) 19:13, 1 May 2014 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 one external links on Analysis of variance. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20141107211953/http://biomedicalstatistics.info/en/multiplegroups/one-way-anova.html to http://www.biomedicalstatistics.info/en/multiplegroups/one-way-anova.html
 * Added archive https://web.archive.org/web/20150405053021/http://biostat.katerynakon.in.ua/en/multiplegroups/anova.html to http://www.biostat.katerynakon.in.ua/en/multiplegroups/anova.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 08:52, 12 October 2016 (UTC)

"It [ANOVA] is conceptually similar to multiple two-sample t-tests, but is less conservative (results in less type I error)"
I think this is either wrong, or needs some clarification. Surely if the multiple t-tests are taken at face value and no correction is applied, then it is they that are less conservative than ANOVA, and not the other way around, as they will result in more type I errors. If 'conservative' here simply means 'producing fewer type I errors', then ANOVA is more conservative than uncorrected multiple t-tests, not less. Right? L T T H U (talk) 11:30, 5 February 2016 (UTC)

I also noticed this and I agree. By definition, any statistical test should give type I errors with a frequency equal to the significance level -- so it does not depend on the test. I have marked this as requiring a citation. --Denziloe (talk) 16:37, 16 August 2017 (UTC)


 * Agreed. The "less conservative" had already been changed to "more conservative," and I added a citation. GinaZzo (talk) 19:47, 11 November 2017 (UTC)

Incorrect Error DF for Two Way Table.
Under "Design of Experiments Terms" for the Two Way ANOVA Table, the error DF reads (h-1)*(k-1). I believe this is incorrect. For a two way with no interaction term it should be N-h-k+1. Is that correct? — Preceding unsigned comment added by 2601:8C:C000:F9B3:C1CF:1AF3:44D:CA67 (talk) 01:20, 11 December 2018 (UTC)
 * Yes. The ANOVA tables were added in this edit of 26 June by 103.213.201.117. They don't fit in the "Design-of-experiments terms" subsection of the "Background and terminology" section, however, so I've removed them. --Qwfp (talk) 20:39, 11 December 2018 (UTC)

Formatting within quotations
Article has bold phrases within quotations: As a result: ANOVA "has long enjoyed the status of being the most used (some would say abused) statistical technique in psychological research."[14] ANOVA "is probably the most useful technique in the field of statistical inference."[15] I suspect that this formatting doesn't appear in the original resources, and so should be either removed, or else noted as an editorial modification introduced here. —DIV (120.17.160.228 (talk) 05:55, 21 January 2019 (UTC))
 * Removed in line with MOS:BOLDFACE. Thanks for spotting that. Qwfp (talk) 20:13, 22 January 2019 (UTC)

Article rambling and vague
I tried to learn the basics of ANOVA by this article, but was forced to find better sources. Half the article goes by before the calculations involved are even defined, which may be fine for a textbook, but as a encyclopedia article this should get to the point much faster. In my opinion, most of the terminology, types of models, and characteristics should be moved later. The assumptions should also be moved later, but maybe add in a couple sentences of summary into the lede or where relevant. The background section should be shortened significantly: it's not necessary to explain every detail about statistical testing in this section. For example, let's take the following paragraph: "By construction, hypothesis testing limits the rate of Type I errors (false positives) to a significance level. Experimenters also wish to limit Type II errors (false negatives). The rate of Type II errors depends largely on sample size (the rate is larger for smaller samples), significance level (when the standard of proof is high, the chances of overlooking a discovery are also high) and effect size (a smaller effect size is more prone to Type II error)." This is basically the third time that hypothesis testing has been defined in this section (each time slightly differently) and so the whole thing can probably just be omitted. Readers can follow the link to statistical hypothesis testing if they want a detailed explanation
 * Agree. Article needs extensive cleanup to make it intelligible. --Sahir 01:45, 26 April 2021 (UTC)

In particular, the "background and terminology" section is a rambling mess that doesn't actually provide any "background" and doesn't actually explain any "terminology" (except for sloppy definitions of a few general concepts in null hypothesis testing, such as statistical significance). Most of the section is unsourced, and the rest is unclearly sourced (e.g., what is Gelman, 2005?). I'm removing the section, since it likely adds more confusion than useful information. 23.242.195.76 (talk) 08:22, 24 July 2021 (UTC)