Talk:Effect size

Beginning of article
I've started the ball rolling here guys with an update of a stub and put just one effect size formula down as an example. Do people think this is enough? Grant

Evolution of effect size
Grant:

I've been reading some more about meta-analysis and effect sizes in relation to the Wikipedia article.

The article now contains two ways of computing an effect size, but it seems that neither is the one that is most commonly used, which is the difference in means divided by the pooled standard deviation. You refer to the denominator provided in Cohen's d as the pooled standard deviation, but I see this also referred to as the mean SD, reserving the term "pooled" for when the two sample sizes are taken into account.

Although I haven't seen Cohen's original formulation for effect size, from what I have read it seems he specified only that it is the difference in means divided by a standard deviation and that the standard deviation of the control group was originally most commonly used. Then other standard deviations were used (e.g., mean and pooled) and other adjustments made (as in Hedges' g).

So what appears to me to be the most commonly used index of effect size, that is the difference in means divided by the pooled SD (taking the sample size of each group into consideration) is not given in the Wikipedia article. This is also the effect size obtained if it is generated from a t value ( ES = t (sqrt ((n1 + n2)/n1n2))), F ratio, or exact probability for a t-value or F-ratio (see Lipsey and Wilson, 2001, pp. 173-175) --reference added to Wikipedia article.

So I suggest that the article be organized a such:


 * 1) correlation coefficient and r-square
 * 2) Cohen's d as difference between two means divided by a standard deviation, originally SD of control group
 * 3) . . . . divided by mean of the two SDs (as it currently appears).
 * 4) . . . . divided by the pooled SD, and how this value can also be computed from a t, F, or exactly provability of a t or F
 * 5) . Hedges' g with an explanation of what the second part of the formula does.
 * 6) . Odds ratio

This would show the "evolution of the effect size and the basic different ways to calculate it.

I could take a whack at this myself, but still have to figure out how to write formulas on the Wikipedia and I probably won't have time to do this for a while.

Let me know what you think. I also added this to the article's discussion so this doesn't become just a private discussion between you and me.

--Gary Gary 11:52, 4 April 2006 (UTC)

Hi Gary,

I went through Cohen's 1988 bible again and he suggests the formula I entered. This formula is definitely more conservative towards the larger variance, but does take both into account. Taking N into account (or, rather, n1 and n2) is really where Hedges' formula comes in. Maybe I need to add more explaining the second part of Hedges' as you suggest?

Yes, I agree that one of the benefits of ES is how it can be converted between all the main statistics. I added many of those functions into my ClinTools software - problem is, there are just so many permutations.

Best for now,

Grant 14:00, 18 November 2006 (UTC)

ToDo list
1. Add discussion of the f effect size measure for ANOVA F-tests.

2. Add discussion of the w effect size measure for Chi-Square tests.

--DanSoper 07:56, 11 June 2006 (UTC)

3. Add discussion of Bonett's (2008, Psychological Methods) standardized linear constrast of means and confidence interval results for between-subject and within-subject designs  — Preceding unsigned comment added by Tukey1952 (talk • contribs) 00:46, 24 September 2011 (UTC)

continuous or binary
"Pearson's r correlation is one of the most widely used effect sizes. It can be used when the data are continuous or binary".

This provoked a discussion as to whether this should say "continuous or discrete". I wonder if this could be clarified. I would assume that discrete numeric data were also fine (though not discrete qualitatitive "levels"). --Richard Clegg 16:36, 10 August 2006 (UTC)

There is also a need for explanation of partial eta squared and other types of effect sizes such as omega etc. I would be gratefull if you could help. In addition it would be nice to have what social scientists consider as adequate effect size (e.g. cohens distinctions between large and small effect sizes. Dimitrios Zacharatos 24/03/2007

SOME LINKS ARE DEAD —Preceding unsigned comment added by 137.56.137.204 (talk) 11:35, 20 May 2008 (UTC)

Alien Example in the article
I'm currently reading "Explaining Psychological Statistics" 2 ed. by Barry Cohen. The alien example in this article is almost repeated verbatim to the one found in this book in the beginning of Chapter 8. I'm curious to know whether this is merely a coincidence, or someone forgot to include proper citations. —Preceding unsigned comment added by 147.4.214.44 (talk • contribs)


 * It was probably ripped-off. Many Wiki editors don't know better. Would you mind editing the article to either remove it or cite the book? Thanks! Chris53516 13:33, 5 October 2006 (UTC)


 * It was actually me who wrote the section using the Aliens as an example - and "no" I did not "rip it off". This is a common example used since the scientific method has been taught. I use it when teaching my students and I wrote the words as they came to me sitting in front of the screen. I used that particular example because I had that day been emailing with a colleague (Dr Susan Clancy) who had just finished a book expounding an hypothesis on why people believe they have been abducted by aliens. I don't know the book by Barry Cohen so I can't see the 'repeated verbatim' passages. I can assure you that I do know better than plagiarism, so please don't make that assumption. Maybe Barry Cohen - no relation to Jacob Cohen I take it - sat in on MY lectures, or my old lecturers lectures, or ..... I've just been through the history of this page (and specifically the alien example I used) and one can see that the current example has slightly evolved with minor edits along the way from various people. Not plagiarism. Thanks Grant


 * As an academic, you should know the importance of citing sources. Please see Citing sources for more information. It does not matter whether you are a professor or a hula-hoop dancer, you should still cite your sources. Additionally, your personal experience should not be used as a source for an encyclopedia article. See No original research and Verifiability for more information. The content has been identified as possible plagiarism, and it should be removed. There are many other examples that can be used. – Chris53516 (Talk) 14:51, 9 November 2006 (UTC)


 * Chris, I have seen your own pages and you obviously have a major problem with plagiarism - possibly justified from a past experience - and also seem quite keen to flame. Your continued accusations here are insulting. I will put the origional text back. Please give evidence if you wish to accuse of plagiarism. As you will also be aware - experts in the area write original work and use examples. I have identified myself to place my comments on Wikipedia in context. I have sourced all my comments. The actual guidelines suggest talking to the authors before removing text - something you feel is unnecessary? Grant


 * Don't start with the personal attacks--that's petty. Personal attacks. I have never plagiarized anything, and I have no idea what you're talking about. The text you will put back is completely unnecessary. Did you even bother to read the policies I mentioned above? Wikipedia does not allow original research. The text isn't necessary to the article at all, therefore I do not need your permission to remove it. It's not important. Keep your petty personal attacks to yourself next time. – Chris53516 (Talk) 03:43, 10 November 2006 (UTC)


 * Oh dear this is degenerating - you have completely missed my point and this is becomming very tiresome (I'm sure that's one thing we agree on). When I said "you obviously have a major problem with plagiarism - possibly justified from a past experience" I meant that someone may have plagiarised from YOU in the past. I can't see how it could have been interpreted any other way - but I would like to set the record straight now that I never intended the implication that you have plagiarised from anyone else (how could I - I don't even know who you are?). Regarding the substance of the issue: the reason why it is necessary to include the alien example is because later on I had provided a working example of effect size analysis, that without the earlier section just sits there in isolation making very little sense. Please read the whole entry - I'm a bit confused how you could have missed it (see the example after Cohen's d). This is why the text IS necessary. Also, if you visit the old, old pages when we were putting this page together from a stub (e.g., http://en.wikipedia.org/w/index.php?title=Effect_size&oldid=11322482) you will see how the page has evolved over time. Now, I'm not sure quite what to do regarding this because it looks like every time I put the entry back up you're just going to remove it again and the example I wrote later on makes no sense at all. How about this: I put back the entry - if you don't like it, fine. But rather than just removing it, how about changing it (and the later example using real data - like the current example uses)? How about using the preference of coffee over tea in the USA? Or maybe the effect size of treatment over placebo for some type of medical intervention? I think this way there is a productive outcome that is informative to the viewer and you don't have to see the word 'aliens' on the page. However, it would require you putting in effort rather than just deleting text. Is this acceptable to you? Grant


 * No, it is not acceptable. You have the burden of proof, which is to say that you have the burden of re-writing the page since it is your content that is under scrutiny. The example is NOT necessary because the article can conceivably exist without it. – Chris53516 (Talk) 14:18, 13 November 2006 (UTC)


 * You didn't even wait for my opinion, so why do you even bother to ask? Do NOT add the text back again, or I will have you cited for vandalism. – Chris53516 (Talk) 14:20, 13 November 2006 (UTC)

(un-indenting) Sorry Grant, but as a neutral third party, I have to agree with Chris here. You MUST cite sources. With regard to: "How about this: I put back the entry - if you don't like it, fine. But rather than just removing it, how about changing it (and the later example using real data - like the current example uses)" - unless you cite your sources, it will be removed. If you cite your source for that passage, it will be kept in there. Citing sources is not at all hard to do; see WP:CITE. There are a plethora of neat templates that have been created to help users cite sources. Since you were the one who created the content, the burden is on you to provide a source for your material (much easier than having other editors try to track down what book you got this from). I do think, though, that citing Grant for vandalism is a bit of an overreaction, Chris. He seems like a good faith editor to me. Perhaps he's just a bit misguided on some of the Wikipedia policies. Gzkn 01:58, 16 November 2006 (UTC)


 * Perhaps I did, but I deal with vandals all the time. And every now and then I deal with a vandal who likes to continue to come back and restore what was undone, which is what I felt like was going on here. Furthermore, I was attacked personally here, which was completely unwarranted and did nothing but put me in a bad mood. – Chris53516 (Talk) 05:14, 16 November 2006 (UTC)

Hi Gzkn, thank you for a reasoned voice. However, I can't cite anyone because I didn't copy it. I can understand that people not au fait with teaching stats might find this hard to believe but we use the aliens example all the time. It allows one to create an artificial scenario whereby one can imagine a completely naive being trying to make sense of the world and affords us the opportunity to demonstrate probability. One of the more amusing examples is the Nature article where Beck-Bornholdt and Dubben (1996) demonstrated (wrongly, I might add) that using probability theory one could assume the Pope was an alien! A quick web search turned up the following examples of people using the alien method of explanation to demonstrate probability: here, here, and here. For some reason Chris doesn't like Alien examples, so how about I talk about a monkey becoming sentient and trying to make sense of the world? Then I'll just change the example later on (still on the page and making no sense at all because of the removal of the earlier bit) to a monkey? I could just cite one of my own academic articles if everyone prefers (where I use effect sizes to demonstrate the difference in success rates between two treatments), but that seems rather self-congratulatory. What do you think Gzkn? Cheers Grant 15:37, 16 November 2006 (UTC)
 * Depending on how your article is used, citing yourself could be in violation of No original research (see especially Citing oneself) and Neutral point of view. If you're going to do it, do it carefully. As far as the example goes, talking about monkeys and aliens just doesn't seem right for an encyclopedia. Wikipedia is not a teaching tool. – Chris53516 (Talk) 15:52, 16 November 2006 (UTC)

Alien Example: Mediation
For those of you who don't know, Grant has requested mediation on this issue, and I've taken up the case. Apologies for getting to it a little later than intended.

Now, looking at the Summary section of the article, I see that there are two other examples given. I'm willing to bet a week's mediator wages that something very similar to each can be found in a textbook somewhere. And yet, I don't think that they are in need of citations. Why not? I'll pull out the part of WP:CITE I think is most relevant:


 * When writing a new article or adding references to an existing article that has none, follow the established practice for the appropriate profession or discipline.

The important question is, then, under what circumstances an example would be cite-worthy in a scientific paper or text. I think that would be the case only if the example is generally associated with at least one person or work. That the same idea is expressed elsewhere is not in itself sufficient.

Grant has provided some sources which support his claim that the aliens example is somewhat ubiquitous; on the other hand, even the original objection to the example didn't actually give any detailed reason to suppose otherwise. Given that, I'm not seeing the case for citation here.

On the basis that I need food, I'm going to stop here and invite comment from both parties, along with anyone else who's interested. Tsumetai 18:50, 18 November 2006 (UTC)


 * Thank you Tsumetai for taking up the case. I suspect that the 'Aliens' example is now just going to upset some people whenever they see it. Therefore, last night, I changed the example to 'imagine visiting england without any previous knowledge of humans...' (in the summary). I then referenced the data which I used in the example (Health Survey for England, 2004) and continued with that example to demonstrate Hedges' g as well. I hope that this satisfies everyone - I can't see an angle, at the moment, that would make this example untenable. Thanks again for taking the time to look at this Tsumetai. I think Chris might be OK with the changes I made. Best, Grant 02:02, 19 November 2006 (UTC)


 * Chris obviously doesn't have any objections to this version so I'll keep it as it currently stands. Thank you Tsumetai for taking the time and looking at this issue. Best, Grant 07:48, 30 November 2006 (UTC)

Cohen & r effect size interpretation
Hi Chris & Mycatharsis, Before I revert anything I thought I should just put a discussion comment first. I see what Chris is saying, but must agree with Mycatharsis. Cohen proposed the interpretations suggested in both the 1988 and 1992 papers. The argument he uses is that r is the most fundamental effect size measure available - allowing for direct interpretation of the degree of association. Any problems form anyone if this is reverted and expanded upon? Cheers, Grant 10:56, 29 December 2006 (UTC)


 * What are you referring to? Please place this conversation wherever the rest of it is because it doesn't make sense. &mdash; Chris53516 (Talk) 15:10, 29 December 2006 (UTC)


 * Hi Chris, sorry this didn't make sense to you - I just ran with the last edit made and wrongly assumed everyone was in my head space. I was referring to the edit changes you made here and was discussed on Mycartharsis' talk page here . Is this the info you needed? Cheers, Grant 13:10, 30 December 2006 (UTC)

I think your interpretation that r is an effect size is incorrect r-squared can be interpreted as an effect size. I've never heard of r being used as one. Furthermore, that seems logical inappropriate to use r because it is not a directional relationship; that is, the relationship between the two variables can go either way. On top of that, third, unknown variables could be causing the two that are correlated. For example, the number of murders rises (or is correlated) with the amount of ice cream consumption, but the temperature actually influences both. As it gets warmer, people interact more outside, and more violence occurs. Also as it gets warmer, people eat more ice cream. (This is just an example, please do not over-interpret it.) Anyway, to use r as an effect size would imply a causal relationship that is not established with r. This is my thinking. &mdash; Chris53516 (Talk) 06:14, 31 December 2006 (UTC)


 * I'm afraid I must disagree here. Indeed, r-squared is commonly used because it gives a direct amount of variance that is attributable to the relationship (even if there is a third factor influencing the relationship). r is frequently used as the 'base' effect size and many meta-analyses convert to r rather than d or g (r-squared is exactly that - r multiplied by r). It is frequently referred to as the effect size correlation. A good resource (given on the article page too) is Becker's pages who gives the various formulae to convert to r . We must also keep in mind that effect sizes are directional only by virtue of their + or - sign: the same as r (all depends upon how you set this up in the analyses and whether + is a good or bad thing). I take your point with the ice-cream example, but all effect sizes are subject to this (the notorious Type III problem) and so effect sizes never establish causation, but interpretation may imply it. Cohen gave the rules of thumb (small = 0.1; medium = 0.3; large = 0.5) in his 1988 treatise and 1992 Psych Bull paper. A better explanation is provided by: Rosnow, R. L., & Rosenthal, R. (1996). Computing contrasts, effect sizes, and counternulls on other people's published data: General procedures for research consumers. Pyschological Methods, 1, 331-340. This has always been my interpretation. ps. Happy New Year from down-under everyone. Grant 13:45, 2 January 2007 (UTC)


 * I'm not really satisfied with that explanation. r and r-squared are not measures of effect size; merely of the goodness of fit of the model. R squared likewise tells you nothing about the effect SIZE, only the percentage of the dependent variable which is being affected. The Beta coefficient indicates effect size; it indicates the direction and magnitude of the relationship between the variables. You need to actually think about what the measures are telling us. r and r squared tell us what portion of the variance in the dependent variable is varying in common with the independent variable - correlation - but tell us nothing about effect size, which is the coefficient of the equation of the line. That is, how much change in Y is caused by a one unit change in X. THAT is effect size as it shows you what one variable is doing to the other. r and its derived cousins are measures of the goodness of fit of the model, but tell you nothing about the effect size, only how accurate your model is. This should be immediately obvious; it is perfectly reasonable to find a very high correlation with a very low effect size. When you're talking about r you are only talking about the effect itself, not the magnitude of the effect. 59.167.60.92 (talk) 11:54, 17 January 2009 (UTC)

Hi all and especially Grant, Have you noticed that the current version of the article - the section on Cohen & r effect size interpretation - says that "Cohen gives the following guidelines for the social sciences: small effect size, r = 0.1 − 0.23; medium, r = 0.24 − 0.36; large, r = 0.37 or larger" (references: Cohen's 1988 book and 1992 Psych Bull paper) whereas, as Grant says, those two works by Cohen say that small = 0.1; medium = 0.3; large = 0.5. I wonder where the .10/.24/.37 figures came from? If anyone is familiar with these cutoffs perhaps they could add a reference? Thanks, Jane —Preceding unsigned comment added by 86.175.216.72 (talk) 00:32, 7 February 2010 (UTC)


 * Yes, where are these ranges coming from for r. I don't have the Cohen (1988) in front of me, but I don't see these ranges in Cohen (1992). — Preceding unsigned comment added by Schmitta1573 (talk • contribs) 14:50, 25 September 2011 (UTC)


 * It's just a term, but I disagree that r-squared is not an effect size. Following the argument of 59.167.60.92, only a regression coefficient or mean difference (something on the scale of the data) would be an effect size.  Even Cohen's d would be excluded.  But the important distinction is not relative (r-squared, Cohen's d) versus absolute (regression coefficient, mean difference) measures, it is whether the statistic is an inferential quantity like a Z-score or p-value that scales with sample size (definitely not an effect size), or something like the correlation coefficient, regression coefficient, odds ratio or Cohen's d that estimates a relative or absolute population parameter that is independent of the sample size.  In the case of the correlation coefficient, it is unfortunately common to read that a sample correlation coefficient is significantly different from zero, perhaps with some p-value, but then not be told what the actual estimated correlation coefficient is.  This is a situation where the effect size (i.e. the estimated correlation coefficient) should be provided along with the inferential stuff.  I do agree that something that is called "size" should be magnitude-like and therefore not have a sign.Skbkekas (talk) 02:15, 7 February 2010 (UTC)

According to Cohen (1992, Table 1, p. 157), .2, .5, and .8 are small, medium, and large effect sizes for d, respectively, but .1, .3. and .5 are small, medium, and large effect sizes for r, respectively.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. http://web.vu.lt/fsf/d.noreika/files/2011/10/Cohen-J-1992-A-power-primer-kokio-reikia-imties-dydžio.pdf — Preceding unsigned comment added by 67.180.50.244 (talk) 17:13, 23 December 2013 (UTC)

Spelling
Please note that it is unnecessary to change British spelling to American spelling. See Manual of Style for more. &mdash; Chris53516 (Talk) 19:00, 8 June 2007 (UTC)

Hedges G
Should the text "As an estimator for the population effect size θ it is biased" actually read "g is biased"? 85.218.70.138 (talk) 12:33, 12 April 2010 (UTC)

There is a heavy misunderstanding of Hedges G in the text. Altought the calculation of the example is quite correct, the "more conservative" estimator of the effect size has nothing to do with the magnitude of the sample size, in fact, it has to do with the smaller variance of the women group (and this becomes more weight because of the larger group of women). This becomes clear when the sample size of the Men is increased to 3311, the resulting effect size will be 1.72 although the absolut sample size is larger. I suggest to use a example with equal standard deviations instead. 212.118.219.90 (talk) 07:40, 4 September 2008 (UTC)Markus


 * To me it seems that a large part of the Hedges g section of the present version is wrong. The Cohen's d is not heavily influenced by different sample sizes between the two groups compared to Hedges' g. Their forms for the pooled standard deviation are the same except for a "-2" in the denominator (at least in the presentation of Cohen's d that I got access to). &mdash; fnielsen (talk) 13:18, 8 October 2008 (UTC)
 * I have now erased a considerable part of the text. &mdash; fnielsen (talk) 14:18, 8 October 2008 (UTC)

What is the difference between Cohen's D and Hedge's G in the current definitions?
Currently, both Cohen's D and Hedge's G are presented in the same way: in both cases, the difference between the two means and the pooled standard deviations, with weights for S1 and S2 proportional to n-1 (n being the sample size). Then, what is the difference between the two? Borisba (talk) 15:43, 26 June 2022 (UTC)

Hat?
Why does g have a hat? Neither in Hedges/Olkin nor Hartung/Knapp/Sinha is there a hat? &mdash; fnielsen (talk) 19:23, 7 October 2008 (UTC)
 * I have now used the symbol g and g-star, instead of g-hat, so as not to make the confusion that g-hat is an estimator of g. &mdash; fnielsen (talk) 14:18, 8 October 2008 (UTC)

Correction Factor
The equation for the correction factor had a -9 in the denominator instead of a -1. http://files.eric.ed.gov/fulltext/ED309952.pdf shows a -1 instead. If this equation was used to calculate any examples then they may need to be updated.Tdilorenzo (talk) 14:31, 1 June 2016 (UTC)


 * I am a bit daunted by some of the venom here as I'm not a statistician. However, it seems to me that this has not been corrected. I don't have the Hedges & Olkin book but I do have the Hedges (1981) paper and the formula given there has the "-1" not "-9" and definitely approximates more closely to the correct value for g than the formula given here.  See my Rblog post: .  I am happy to change the formula to the correct one but worry that I'm missing something. Cpsyctc (talk) 12:43, 21 January 2024 (UTC)

Why hasn't anyone cited Jacob Cohen?
Jacob Cohen basically invented the concept of effect size. Or at least he brought it's importance to the forefront. That's why it's called Cohen's d. Why hasn't anyone referred to Jacob 'Jack' Cohen? —Preceding unsigned comment added by 71.172.128.74 (talk) 01:47, 22 September 2008 (UTC)
 * As of 7 october 2008 there are two references to his work. You can add more if you feel. &mdash; fnielsen (talk) 19:41, 7 October 2008 (UTC)

Cohen's d
I am unsure about the form for Cohen's d. In a book by Hartung, Knapp and Sinha they write (page 14) that Cohen's d is
 * $$d = \frac{\bar{X}_1 -\bar{X}_2}{S}$$

where
 * $$S^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S^2_2}{n_1+n_2}$$

with
 * $$ (n_1-1)S_1^2 = \sum_{i=1}^{n_1} (X_{1i} - \bar{X}_1)^2$$

This does not seem to be the same form as presented in the Wikipedia article. &mdash; fnielsen (talk) 21:36, 7 October 2008 (UTC)


 * $$S^2 = \frac{(n_1-1)S_1^2 + (n_2-1)S^2_2}{n_1+n_2}$$ is not an unbiased estimate of $$\sigma^2$$. I guess it is a ML one. Either has its appropriate usage. &mdash; Lixiaoxu (talk) 17:27, 7 November 2008 (UTC)


 * After I wrote the comment I found that what Jacob Cohen writes on page 20 in his book (It is presently available from Google books ) is not completely explicit. He writes: "sigma = the standard deviation of either population (since they are assumed equal)". Hedges and Hartung on the other hand write explicit formula for how to compute the effect size based on a sample. I think the Wikipedia article should be careful when using the word "Cohen's d"? &mdash; fnielsen (talk) 09:59, 10 November 2008 (UTC)


 * Cohen's d as a term is too popular in social sciences literature. Many variants are named after Cohen's for literature authors just encountered that very idea of standardized effect size at first time or most times with Cohen's name. It should be interpreted better Cohen & other's idea of d than Cohen's formula of d. &mdash; Lixiaoxu (talk) 16:35, 14 November 2008 (UTC)


 * Cohen indicates three definitions for the denominator. The first fneilsen pointed out, the second (p.44) for when the standard deviations are not equal and defined as:
 * $$ \sigma^' = \sqrt{\frac{\sigma_A^2 + \sigma_B^2}{2} }$$
 * and the third (p.67), "s = the usual pooled within sample estimate of the population standard deviation" which is congruent with the equation for pooled variance or Hedges' $$s^*$$ but in different notation. It can be debated whether all of these are needed or desired on the page. Regardless, the current definition for pooled standard deviation is wrong (though not grossly so, there should be a -2 in the denominator) and the line below it looks like the remains of a careless edit when the formula was changed to its current state. I will be correcting both.  —Preceding unsigned comment added by 216.170.110.94 (talk) 15:49, 17 November 2010 (UTC)

216.170.110.94: What does the Hartung book says on page 14? As the Hartung book is referenced for the formula you shouldn't change the formula. It will also make it inconsistent with the writing around the Hedge definition. &mdash; fnielsen (talk) 09:38, 31 January 2011 (UTC)

There is an interesting discussion about the lack of consistency in notations, definitions and estimates for standardized mean differences in "McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The case of r and d. Psychological Methods, 11(4), 386-401." http://www.bobmcgrath.org/Pubs/When_effect_sizes_disagree.pdf. They actually suggest the 'd' notation for the version currently presented in the wikipedia article. Some of the confusion in terminology/definition here is due to confusion between population parameters and their estimates. Cohen clearly defines his 'd' in terms of population parameters, not sample statistics, for example in his book "Statistical Power Analysis for the Behavioral Sciences" (p. 20 of 2nd edition). The 2006 paper I mentioned also shows simple approximation for the g bias correction factor (J?) due to Hunter & Schmidt, 2004: (N-3)/(N-2.25). —Preceding unsigned comment added by 143.107.252.87 (talk) 15:52, 21 February 2011 (UTC)


 * I have now been trying to clean up a bit . I am not entirely sure that it is satisfactory. &mdash; fnielsen (talk) 08:58, 30 July 2014 (UTC)

STATA meta-analysis equations are different from this article
STATA which often seems the "industry standard" in carrying out meta-analyses uses slightly different equations from those used in this article. The program code for the METAN function may be found here and uses the following equations:

Pooled standard deviation:
 * $$s = \sqrt{\frac{(n_1-1)s^2_1 + (n_2-1)s^2_2}{n_1+n_2-2} }$$

i.e. difference is the -2.

The article says this equation is used for hedges g only, however it appears that the METAN function uses this version of the pooled standard deviation for Cohen's d, Hedges g and Glass's Delta. This equation is shown in the pooled standard deviation page too.

Variance of Cohens d (not yet explicitly in article):
 * $$\hat{\sigma}^2(d) = \frac{n_1+n_2}{n_1 n_2} + \frac{d^2}{2(n_1 + n_2 - 2)}$$

Variance of Hedges g:
 * $$\hat{\sigma}^2(g) = \frac{n_1+n_2}{n_1 n_2} + \frac{g^2}{2(n_1 + n_2 - 3.94)}$$

i.e. difference is the extra -3.94

Variance of Glass's &Delta (not yet explicitly in article):
 * $$\hat{\sigma}^2(\Delta) = \frac{n_1+n_2}{n_1 n_2} + \frac{\Delta^2}{2(n_2 - 1)}$$

Does anyone know why these differences exist?

Should we change the article to these equations? 194.83.139.137 (talk) 14:36, 30 January 2009 (UTC)

I would now like to substitute the above equations into the article shortly. Please comment here if you agree/disagree - thanks 194.83.139.177 (talk) 18:25, 31 July 2009 (UTC)


 * Have we resolved this issue? I wrote the variance equation, but I am puzzled. The Hedges and Hartung books are not clear to me. I tried to derive the variance for Hedges g and g* myself and/but ended up with 2(n1+n2-4) in the denominator (asymptotically) for g*. &mdash; fnielsen (talk) 15:54, 8 November 2010 (UTC)

Cohen's $$f^2$$
Where does the following expression for Cohen's $$f^2$$ come from?
 * $$\hat{f}_{Effect} = {\sqrt{(df_{Effect}/N) (F_{Effect}-1)}}.$$

I think it's wrong (it also doesn't define $$N$$, but if I'm right, there is no constant value of $$N$$ that would be correct anyway).

In a regression context, where the explained/hypothesis sum of squares $$SS_h$$ is the difference of the reduced-model or total sum of squares $$SS_t$$ and the full model or error sum of squares $$SS_e$$, we have
 * $$R^2 = 1 - \frac{SS_e}{SS_t} = \frac{SS_h}{SS_t}$$,

and using these expressions gives
 * $$f^2 = \frac{R^2}{1-R^2} = \frac{SS_h}{SS_e}$$.

The F-statistic itself is
 * $$F = \frac{SS_h/df_h}{SS_e/df_e}$$,

showing that
 * $$f^2 = F\frac{df_h}{df_e}$$.

It also makes sense to me that, informally, an effect size measure relates to its corresponding statistic by 'dropping' the dependence on sample-size. More precisely, Cohen's d relates to the t-test by using some estimate of standard deviation in place of the standard error, which involves sample size; comparison of $$f^2=SS_h/SS_e$$ and the expression for $$F$$ above shows a roughly similar relationship, in that the degrees of freedom, which relate to the sample size (and complexity of the hypothesis) are removed.

I'm very open to the possibility that I am wrong, but in this case I think the equation is sufficiently far from being obvious as to necessitate a reference (I found it ironic reading all of the argument above about citations for the aliens example, when it is the definitions and the equations giving relations that are sorely missing references, not the examples, which I think were perfectly reasonably given as-is!)

Ged.R (talk) 17:04, 7 April 2009 (UTC)

I've just realised that my comment about the equation requiring a reference might leave me open to a charge of hypocrisy for not citing any sources myself. I should have said 'necessitate either a reference or a clear derivation', since I would argue that my derivation is at least clear (even if it turns out to have a flaw somewhere, at least someone could point to the specific flaw, whereas the article as it stands simply plucks a peculiar equation from thin air). My equations for $$R^2$$ match two expressions given in Squared multiple correlation except notationally, I have:
 * $$SS_h = SS_{reg}$$
 * $$SS_e = SS_{err}$$
 * $$SS_t = SS_{tot}$$

My equation for the F-statistic matches one in F-test under the following translation of notation:
 * $$SS_h = RSS_1-RSS_2$$
 * $$SS_e = RSS_2$$
 * $$SS_t = RSS_1$$

(Notation is clearly a mess here. I know of one proposal for a standard notation in the related field of Econometrics, Abadir and Magnus (2005), doi:10.1111/1368-423X.t01-1-00074 but it only goes as far as to say RSS denotes 'residual sum of squares'. I think the 1 and 2 subscripts used in the F-test are a poor choice, since 1 typically relates to the alternative hypothesis, in which case the total or restricted RSS could be $$RSS_0$$ to denote that it is restricted according to the null hypothesis, while the error RSS could be $$RSS_1$$ to denote that it is under the alternative hypothesis of the full model. It would then seem reasonable to have $$RSS_H = RSS_0 - RSS_1$$ to denote the regression or hypothesis RSS. But anyway, this is beside the current point.)

Ged.R (talk) 18:07, 7 April 2009 (UTC)

T-test for mean difference of single group or two related groups
This section would some tidying up by someone with a suitable grasp of the details and authority from the scary editors. The sentence "Usually, μbaseline is zero, while not necessary. " in particular needs some work. 87.114.240.70 (talk) 20:50, 24 March 2010 (UTC)

Expert attention
I added the "expert" template, because from the section "Confidence interval and relation to noncentral parameters" onwards the English is so poor that it is difficult to see what is going on. There seems to be not even an attempt to explain what is going on. But there may be something important being said. This may relate to the immediately above comment. JA(000)Davidson (talk) 16:44, 27 May 2011 (UTC)

New paper on effect size in APA journal
Kelley has another paper out and defines effect size as a quantitative reflection of the magnitude of some phenomenon that is used for the purpose of addressing a question of interest. Might be worth encorporating into the article. Agree with the sentiment above that the article needs work, by the way. Tayste (edits) 22:05, 12 September 2012 (UTC)

Alternate names
Is moderate the same as a medium effect size? If these can be used interchangeably, then I think the article should state this because both are common in the psychology literature. If there are other synonymous terms, I think they should be included as well. --1000Faces (talk) 22:05, 14 July 2013 (UTC)

Calculating cohen's d from example
Hello everyone,

Perhaps Cohen's d is calculated incorrectly. The provided example might need clarification.

The text is: "So, in the example above of visiting England and observing men's and women's heights, the data (Aaron,Kromrey,& Ferron, 1998, November; from a 2004 UK representative sample of 2436 men and 3311 women) are: Men: mean height = 1750 mm; standard deviation = 89.93 mm Women: mean height = 1612 mm; standard deviation = 69.05 mm The effect size (using Cohen's d) would equal 1.72 (95% confidence intervals: 1.66 – 1.78). This is very large and you should have no problem in detecting that there is a consistent height difference, on average, between men and women."

I calculated Cohen's d from the provided means, standard devations, and group size. The value was 1.756.

When I tried to find the original paper presenting Cohen's d, I could not find the correct article. The link to Aaron, Kromrey, & Ferron (1998) does not seem to work in text, but the link in the reference list does seem to work. I could find the paper, but the data is not presented in that paper. Moreover, it is said that the data is from a 2004 UK sample. I cannot find which paper presented that data from the 2004 UK sample. In addition, I did not understand what is meant by "the example above".

All in all, it seems confusing to me. It leaves me with some questions: 1. What should be the correct value in the calculation example of Cohen's d? 2. Should a paper be cited with the data from the 2004 UK sample? Or if it already is in it somewhere, where can I find it? 3. What are the reasons to cite Aaron, Kromrey, & Ferron (1998) in this example and at that specific location? 4. What is meant by "the example above"?

Perhaps I misunderstood something. However, could anyone clarify this for me? Thanks!

Regards, Joep — Preceding unsigned comment added by 86.89.158.241 (talk) 15:14, 26 July 2013 (UTC)

Yes, I too am finding that the correct value of Cohen's d from the above equation should be 1.756 rather than 1.72. — Preceding unsigned comment added by Dlb012 (talk • contribs) 16:06, 29 August 2013 (UTC)

And, yes, me too. I also stumbled across this issue. It is rather disturbing to have a detailed discussion about various differnt versions of s but then to continue with an example which does not mention which s was used. I actually figured out how the s in the example was calculated: it is the average of the two squared s1 and s2, so s = (s1^2+s2^2/2 = 80.172. If you use that s, you obtain d = 1.72. Otherwise, as noted above, one obtains d = 1.756.Lionelkarman (talk) 14:14, 5 June 2014 (UTC)

Joep is also right about the Aaron, Kromrey, & Ferron (1998) reference: a 1998 paper analyzes 2004 data, just fancy that! The paper exists but does not contain the example. It would easy to remove the reference, but then there is no source for the example. This is all very unfortunate.Lionelkarman (talk) 14:27, 5 June 2014 (UTC)

Coefficient of determination erroneously referred to as correlation coefficient?
Under this article's section regarding "correlation coefficient", it says "... r², the correlation coefficient (also referred to as "r-squared")". However, the wiki page for correlation coefficient explains the term as referring to simply r. Moreover, [| further down in that article] it is explained that r² is called the coefficient of determination. I do believe that the latter is the correct term for r². I might however be mistaken and so I hope someone more statistic savvy can confirm and correct this misnaming in the article. Thank you.


 * Titles were corrected. Pearson r is the same as correlation coefficient. It can be used as an effect size although the coefficient of determination is also used. Dger (talk) 04:36, 29 June 2014 (UTC)

Crappy beginning
I've revised the first para., which currently starts with two definitions, gives an example of a phenomenon (not an effect size) then says that effect sizes are descriptive and not inferential but also inferential (WTF?) and fails to directly connect effect sizes to NHST. I added a recent, and I think compelling, example of how effect sizes should be used.Amead (talk) 05:25, 6 July 2014 (UTC)

Expanded Descriptors
An editor removed a table with expanded descriptors, stating (a) they aren't widely accepted, (b) there are only 5 citations, and (c) the justification given in the references aren't clear. Before I undo this, please comment on (a) there is no such wikipedia standard called "widely accepted", (5) there is no wikipedia standard indicating 5 citations is insufficient, and (c) the justification argument being advanced by the editor is a wikipedia violation of original research (i.e., the editor is debating with the article). I'll wait a bit to see if others wish to comment, or if the editor can document wiki sources for (a) and (b). As for (c), the editor is apparently calling for a recitation of the literature reviewed in the citations, which could be done but would just make the article longer. As the tags note, this article is clearly a long standing mess, and I'm not sure why hurdles are being set up to prevent its improvement. — Preceding unsigned comment added by 2601:40F:401:5A14:3C18:A5A1:588E:9B3A (talk) 14:23, 7 October 2016 (UTC)
 * To be more precise, I didn't see 5 citations using expanded descriptors, I only found 4:

1. The Journal of Pain 2. Journal of Postsecondary Education and Disability 3. American Journal of Applied Mathematics and Statistics 4. British Journal of Applied Science & Technology


 * although I did find 2 doctoral dissertations based on the expanded descriptors. 2601:40F:401:5A14:3C18:A5A1:588E:9B3A (talk) 14:32, 7 October 2016 (UTC)
 * It has been about a month. I see there is an additional reference to the expanded descriptors in Ary et al. 2012, p. 150, so I am reverting the deletion of the previous editor.2601:40F:401:5A14:A999:96B1:92A2:7081 (talk) 15:19, 1 November 2016 (UTC)

Incomprehensible article
This article will not teach a layperson ANYTHING. It's incomprehensible. E.g. when someone reads "we got an effect size of 1.24" and comes here to understand what that means, they will end up NOT understanding what that means.

I first posted the above comment (as an edit summary) three years ago; the page has not improved in this regard. My guess is that 95% of this page needs to be discarded as too technical (or moved to a wikibook on statistics), and the other 5% needs a complete rewrite for simplicity and clarity. Gnuish (talk) 06:56, 12 January 2017 (UTC)


 * Although Wikipedia is not a textbook, I'm sympathetic to this issue. I agree that articles that are incomprehensible to the average layperson are of limited value. But I doubt anything will come of these comments without listing specific things that are unclear. Also (and I'm not criticizing) your example isn't comprehensible to me either. It's like saying "the animal weighs 3 kg"... Without knowing what animal, I don't know if 3 kg is big (that would be a HUGE mouse) or tiny (like a micro elephant). An effect size of 1.24 is not interpretable without more information. If you tell me that's Cohen's d or Hedge's g, then I know it's a fairly large difference (about 1.25 standard deviations, which is a big difference for two variables distributed approximately normally). Finally, I'd like to point out that the reason this is all so complex is that we're describing things that generally have no natural metric, like temperature. So, in order to even begin to understand an effect size, you have to understand it's metric. For temperature, you have to be told whether it's Fahrenheit, Celsius, Kelvin, etc.... 35F, 35C, and 35K are really different temperatures.Amead (talk) 23:03, 25 August 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Effect size. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20090323151854/http://www.mdp.edu.ar/psicologia/vista/vista.htm to http://www.mdp.edu.ar/psicologia/vista/vista.htm

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 06:02, 18 September 2017 (UTC)

Cliff's Delta
If you read the paper, you see it's not really for ordinal data. The analysis is ordinal, but it was very much intended to be used as robust general metric, including for continuous data, especially if assumption re normality and variance aren't met, as needed for Cohen's d.

Wilcoxon's r
Shouldn't Wilcoxon's $$r$$ get a mention in the article? It is defined as


 * $$r := \frac{z}{\sqrt{N}} $$

where $$z$$ is the z-score and $$N$$ is the sample size or number of trials.

PedantNumber1 (talk) 18:28, 17 April 2022 (UTC)

Cohen's w or omega?
Many sources call it "Cohen's omega" ($$ \omega$$), not Cohen's w. The two letters look very similar, hence it's easy to mix them up, but they are different. In the Cohen's book (Statistical Power Analysis for the Behavioral Sciences, 1988), the letter is different from the regular latin w (in the book, it's denoted in bold, with a slightly different font), so I'd be inclined to think that in fact it's omega, not w. If I'm correct, this is something that should be corrected in the article. 85.169.195.108 (talk) 08:15, 12 March 2023 (UTC)