Talk:Fleiss' kappa

Original example from Fleiss (1971)
1	2	3	4	5	p_i 1				6		1.000 2		3			3	0.400 3		1	4		1	0.400 4					6	1.000 5		3		3		0.400 6	2		4			0.467 7			4		2	0.467 8	2		3	1		0.267 9	2			4		0.467 10					6	1.000 11	1			5		0.667 12	1	1		4		0.400 13		3	3			0.400 14	1			5		0.667 15		2		3	1	0.267 16			5		1	0.667 17	3			1	2	0.267 18	5	1				0.667 19		2		4		0.467 20	1		2		3	0.267 21					6	1.000 22		1		5		0.667 23		2		1	3	0.267 24	2			4		0.467 25	1			4	1	0.400 26		5		1		0.667 27	4				2	0.467 28		2		4		0.467 29	1		5			0.667 30					6	1.000 total	26	26	30	55	43 p_j	0.144	0.144	0.167	0.306	0.239

- FrancisTyers · 20:18, 17 July 2006 (UTC)


 * Sum p_i	16.6730
 * Sum all	180

Spreadsheet

 * B34 = 0.144, C34 = 0.144, D34 = 0.167, E34 = 0.306, F34 = 0.239
 * K3 = 6, K4 = 30, G37 = 16.6730
 * G40 = 0.558, G41 = 0.2201


 * Pbar =sum(1/(K4*K3*(K3-1))*(G37*K3*(K3-1)))
 * Pbar_e =sum(B34^2+C34^2+D34^2+E34^2+F34^2)
 * k =sum((G40-G41)/(1-G41))


 * k ~ .430

- FrancisTyers · 21:15, 25 July 2006 (UTC)

Potential Corrections
Correction: There is an error in the semifinal equation. I get the sum for the Pbar_e to come out to .213. kappa then comes out to .2099. Without rounding (using php) the value is 0.20993070442196 — Preceding unsigned comment added by 71.162.5.63 (talk) 18:01, 22 October 2013 (UTC)

First of all, I have to give the disclaimer that I have no idea what Fleiss's Kappa is. My comments come from purely mathematical and organizational considerations.


 * It's confusing when one sees a formula without any N's and then sees, "where N = total number of subjects," etc. One might write instead, "Where $$\bar{P}$$ and $$\bar{P_{e}}$$ are calculated as follows.  Let N = ..."


 * In the section where you describe $$P_i$$, you have two formulas for $$P_i$$. They turn out to be the same with one simply expanded, but it makes it appear as if there are two different ways of defing $$P_i$$.  If you want to include both, simply remove $$P_i$$ from the beginning of the second line leaving an equal sign between the two (obviously) equivalent formulae.


 * In the formula for $$\bar{P}$$, I would recommend not using the asterisk. It's rarely used in typed mathematics.  Removing them will not change the formula at all.  JUst use parentheses where necessary to separate one number from another.


 * You (or someone) will want to include something about what this measure "measures". In other words, what does it say about the given data?

I did not have time to verify the correctness of the worked example, so someone else will have to do that.

Thanks for contributing. VectorPosse 22:41, 17 July 2006 (UTC)


 * Thanks! I'll take care of these. - FrancisTyers · 23:05, 17 July 2006 (UTC)


 * Following up last point I found the first sentance perplexing as I did not instantly know what inter-rater reliability meant. I had to go further down to the example section to get a feal for whats its about. A line such as if two or more people assign numercical ratings or scores to a number of items then the Fleiss' kappa will give a measure for how consistant the ratings are might help explain the subject better. --Salix alba (talk) 00:14, 18 July 2006 (UTC)


 * Ok, I'll add something to that effect, thanks :) - FrancisTyers · 10:24, 18 July 2006 (UTC)


 * Thanks for the changes. It is somewhat more clear now.  I substantially rewrote the Equations sections to make it flow.  There are still a few things I haven't had time to do in the worked example.


 * I still haven't checked it for accuracy.


 * The numbers for $$\bar{P}$$ seem wrong. Use the simpler formula from the equations section above and simply plug in the numbers.  Also, write out the sum of the $$P_i$$ since (1) it is easy to miss this number below the table and (2) the worked example should follow the formula given in the section above.


 * Make sure that rounding errors aren't introduced. It looks as if there is some potential for this.  Also, it is technically more appropriate to use $$\approx$$ instead of $$=$$ if you are using rounded figures.

That's all for now. Thanks. VectorPosse 06:06, 18 July 2006 (UTC)

Hmm, regarding $$\bar{P}$$ and indeed the rest of the numbers, I'm a little confused about what number of significant digits I should quote the numbers as. Fleiss' paper has them as 3 (e.g. .144) etc. I'll see if I can take care of the other concerns. - FrancisTyers · 10:24, 18 July 2006 (UTC)


 * You're right, the numbers for $$\bar{P}$$ do seem wrong. I'm going to check my calculations again. Thanks for spotting that :) - FrancisTyers · 10:35, 18 July 2006 (UTC)


 * Ok, I've got to go out, I'm going to add the accuracy-section tag until I get back and work this out. - FrancisTyers · 13:11, 18 July 2006 (UTC)

I know the worked out example has already been flagged, but this has been sitting for a while. I agree with previous posts: I'm pretty sure the end result is wrong (doesn't the final kappa value look like it should be higher?) Kappa is zero when agreement is equal to the amount of agreement expected by random chance, so based on a quick look at the data, I get the feeling that kappa should be a bit above zero. I'm familiar with kappa, but do not remember (or possibly never knew) how to calcuate the statistic by hand. Any help with this is appreciated. --65.197.19.242 20:49, 25 July 2006 (UTC)Mark


 * Ok, looks like I fixed it. - FrancisTyers · 21:29, 25 July 2006 (UTC)

I found something else that looks screwy in the worked example. Currently, the column totals are 20 - 28 - 40 - 28 - 24. Shouldn't they be 20 - 28 - 39 - 21 - 32? --65.197.19.240 20:06, 28 July 2006 (UTC)


 * Thanks, you're right. I must have changed the values after making the totals without changing the totals. Thanks for spotting that! - FrancisTyers · 21:36, 28 July 2006 (UTC)

GA notes
I have failed this for two glaring reasons:


 * 1) One reference and no citations.
 * 2) The formatting is very, very messy.

I admit it - I can barely add most days. While this is a much more advanced thing than junior year algebra II, the formatting still could be better, and some better explanation of what's going on could be useful.

Good luck with it in any case. --badlydrawnjeff talk 16:15, 7 November 2006 (UTC)


 * A few extra thoughts. More needed to put the article in context: What kind of problems is it used for? What are the alternative approaches? What are the pros and cons of the methods. Also I think the java code is not really necessary, the sumation formula should be enough for a programmer to produce an algorithm. --Salix alba (talk) 16:48, 7 November 2006 (UTC)
 * It might also be worth comparing it with other Category:GA-Class mathematics articles to see the sort of standard we are expecting. --Salix alba (talk) 16:53, 7 November 2006 (UTC)

Thanks for the feedback. Regarding the reference and citations: There is only really one paper I had access to which dealt with the subject. That is the original paper by Fleiss. I could put in citations, but they would all be to the same article. I note that Nash equilibrium (a GA) does not have any inline citations.

Could you give advice on what to do about the messy formatting. I thought it was quite neat actually. I will see if I can try to make some more explanation on what is going on. Is one of the problems the fact that "inter-rater reliability" is not explained maybe? I didn't originally include the java code, I think it would work with or without. I will remove it if you think that it is an impedence. - Francis Tyers · 19:12, 7 November 2006 (UTC)
 * Inline cites in mathematics are a contensious issue at the moment, see Scientific citation guidelines. More sources would help establish the notability of the topic. I'd like to see more on what inter-rater reliability is all about, currently theres no article on the topic. There also needs to be a discussion on the significance of the results, what conclusion can we draw from $$\kappa=0.211$$?
 * I've had a go at redoing the formatting, inline maths is tricky as the font styles does not match. --Salix alba (talk) 08:45, 8 November 2006 (UTC)


 * Aye, inline maths is tricky. I will see if there is a way of forcing LaTeX style. Furthermore I will see if I can include something on significance testing, and start an article on inter-rater reliability. - Francis Tyers · 12:19, 8 November 2006 (UTC)

Fixed the math part with HTML rendering by including '~' in (this is normally taken to be a 'non-breaking space' I believe, and so forces it to be rendered with LaTeX. I've started an article on inter-rater reliability, which isn't wonderful but gives some overview of methods. I'm looking into significance testing now. - Francis Tyers · 12:52, 8 November 2006 (UTC)

I moved the data table to the right (which I think looks better), but I have a high res screen, so would welcome further input. I added a section on significance. There is another way to calculate the significance using a z test, but I'm not entirely sure how it works, so I don't feel confident writing about it. I would appreciate it if someone more well versed in maths/stats would help out :) - Francis Tyers · 15:58, 8 November 2006 (UTC)

Second nomination
I'm nominating this again, as most if not all the previous complaints were dealt with, and the article has been static for a while. - Francis Tyers · 20:51, 7 April 2008 (UTC)

Survey
WP:Good article usage is a survey of the language and style of Wikipedia editors in articles being reviewed for Good article nomination. It will help make the experience of writing Good Articles as non-threatening and satisfying as possible if all the participating editors would take a moment to answer a few questions for us, in this section please. Would you like any additional feedback on the writing style in this article? If you write a lot outside of Wikipedia, what kind of writing do you do? Is your writing style influenced by any particular WikiProject or other group on Wikipedia? At any point during this review, let us know if we recommend any edits, including markup, punctuation and language, that you feel don't fit with your writing style. Thanks for your time. - Dan Dank55 (talk) 15:25, 9 April 2008 (UTC)


 * Would you like any additional feedback on the writing style in this article? 


 * Sure!


 *  If you write a lot outside of Wikipedia, what kind of writing do you do?


 * Essays and papers mostly.


 * Is your writing style influenced by any particular WikiProject or other group on Wikipedia? 


 * Nope, mostly from articles I've read in conference proceedings, journals etc.


 * Thanks for taking the time to ask :) - Francis Tyers · 20:16, 9 April 2008 (UTC)


 * Thanks for replying! We'll throw it all in a pile on May 1 and see if we're picking up any trends that were missed before.  - Dan Dank55 (talk) 19:19, 12 April 2008 (UTC)

Good article nomination on hold
This article's Good Article promotion has been put on hold. During review, some issues were discovered that can be resolved without a major re-write. This is how the article, as of April 16, 2008, compares against the six good article criteria:


 * 1. Well written?: Generally yes, but see below.
 * 2. Factually accurate?: Yes, but all sources are from the psychological or medical journals. Do any mathematical articles exist, which discuss the Fleiss' kappa?
 * 3. Broad in coverage?: Yes as I can judge.
 * 4. Neutral point of view?: No problem
 * 5. Article stability? Stable
 * 6. Images?: No images

Please address these matters soon and then leave a note here showing how they have been resolved. After 48 hours the article should be reviewed again. If these issues are not addressed within 7 days, the article may be failed without further notice. Thank you for your work so far. Ruslik (talk) 13:25, 16 April 2008 (UTC)

1) Unfortunatly the lead does not satisfy . The lead should be a summary of the article, which the current lead is not. It contains a formula, which is not mentioned anywhere in the main text, and other information.

2) In addition, the main text of the articles lacks an introduction. I suggest creating such an introduction using the current lead of the article. After that the new lead should be written.

3) In the lead there is a sentence 'The scoring range is between 0 and 1.' Does it mean that the negative values are not scoring? The article does not explicitly state that &kappa; can assume negative values but should, in my opinion.

4) From the definition of nij one can obtain that $$1 = \frac{1}{n} \sum_{j=1}^k n_{i j} $$ This equality should put into the text in sake of clarity.

Ruslik (talk) 13:25, 16 April 2008 (UTC)


 * 2) I'm not sure if any mathematical articles exist that discuss this kappa, the use is as far as I'm aware fairly restricted to the psychological and medical field.


 * 1 and 2) Ok, good idea, I'll work on this.


 * 3) I'll look into that too.


 * 4) Where should this go?
 * The best place is right after the definition of pj, may be, on the same line. It would be also good to number all formulas. Ruslik (talk) 09:39, 17 April 2008 (UTC)


 * I've added the formula. Is there a nice way of numbering formulas in Wiki markup or a recommended practice ? I like how it is done in LaTeX, but I'm not sure if its possible here... - Francis Tyers · 14:40, 17 April 2008 (UTC)
 * I really don't know. Ruslik (talk) 05:48, 18 April 2008 (UTC)


 * Ok, I've added them by using span float right, let me know what you think :) - Francis Tyers · 06:49, 18 April 2008 (UTC)
 * They are Ok. However the last formula (for $$\bar{P_e}$$) should also be numbered. Ruslik (talk) 10:33, 18 April 2008 (UTC)


 * Oops yes, I missed that one. Done. - Francis Tyers · 12:42, 18 April 2008 (UTC)


 * - Francis Tyers · 09:14, 17 April 2008 (UTC)

Anything still missing? - Francis Tyers · 21:23, 20 April 2008 (UTC)
 * The lead is too short. Please, expand it (two or three more sentences). Ruslik (talk) 05:29, 21 April 2008 (UTC)


 * I've added another couple of sentences, but it is difficult to add stuff without basically repeating what is in the introduction. Let me know if I should add more or if it needs to be changed. - Francis Tyers · 08:27, 21 April 2008 (UTC)
 * The only thing that remains is negative values of &kappa;. Ruslik (talk) 10:22, 21 April 2008 (UTC)


 * I changed:


 * The scoring range is between 0 and 1, where a $$\kappa\,$$ value of 1 means complete agreement.


 * to:


 * If the raters are in complete agreement then $$\kappa = 1~$$. If there is no agreement among the raters (other than what would be expected by chance) then $$\kappa \le 0$$.


 * Is this ok ? - Francis Tyers · 11:49, 21 April 2008 (UTC)
 * Yes, its OK. I will pass this article. Ruslik (talk) 08:05, 22 April 2008 (UTC)


 * Thanks :) - Francis Tyers · 08:14, 22 April 2008 (UTC)

The third equation for P[i] was wrong. It implied that the Sum of (nij^2 - n) for 1 to k. But it is the Sum of (nij^2) for 1 to k - n.  —Preceding unsigned comment added by 67.241.11.46 (talk) 04:35, 16 January 2009 (UTC)

ehm... The Landis and Koch guidelines for interpreting the k levels are dated to 1977 while the criticism on those guidelines are dated 1955. Both citations are correct which means that the phrasing should be changed. 132.65.16.64 (talk) 13:23, 4 January 2010 (UTC)

Cohen's kappa
What about unifying this with http://en.wikipedia.org/wiki/Cohen%27s_kappa ? It seems to be the same quantity... —Preceding unsigned comment added by 72.70.76.11 (talk) 23:13, 18 January 2010 (UTC)

What about correcting the English?
What about correcting the English? Fleiss is the last name of the man. Therefore, the possessive is "Fleiss's" not "Fleiss'" (Fleiss is not a plural noun.) — Preceding unsigned comment added by Parhelia (talk • contribs) 18:01, 18 December 2012 (UTC)


 * Per Article titles, we generally use "the most common name for a subject based on as determined by its prevalence in reliable English-language sources". Google Books has 389 hits for "Fleiss' kappa" and 33 for "Fleiss's kappa"; on Google Scholar the results are 1,490 and 75. — Malik Shabazz Talk/Stalk 05:20, 19 December 2012 (UTC)

Normalised versions?
The article mentions that Fleiss' kappa is likely to be lower when more categories are introduced. In this case, is there any evidence for normalising the kappa based on number of categories, almost as if calculating an arithmetic mean? Surely this would reduce disparity between values computed for different numbers of categories. — Sasuke Sarutobi (talk) 10:52, 24 March 2014 (UTC)