Talk:Confusion matrix

Merger
I suggest this article should be merged at this address with Table of confusion. The issue is the same and there should be only one article in order to avoid confusion //end of lame joke//. --Ben T/C 15:46, 21 May 2007 (UTC)

I don not support this change of name. "Confusion matrix" has been used for ever in Speech Recognition, and in some other Pattern Recognition tasks, although I cannot trace the ancestry of the use. For instance, some fairly standard sequence recognition toolkits like HTK have tools specifically designed to obtain this "confusion matrix".

I do grant you that most of the times what we see is a table (specially if reading it from paper), and I guess that the "table of confusion" stuff comes from statistics and people who developed their field before computers even existed.

In communications we call a related diagram a ROC (Receiver_operating_characteristic), each of whose working points is a table of confusion. I suggest "table of confusion" goes in there and "confusion matrix" is improved. --FJValverde 09:24, 14 June 2007 (UTC)

Idea is to have as much information access for as wide an audience as possible here. Since the 2 are the same thing with different terms - what makes sense is merging while redirecting searches for either to this one page. -user AOberai, 14 aug2007

Geography
Just to futher confuse things, confusion matrices arnt soely used in AI (as this article would suggest). A confusion matric is also used in Earth Observation when validating thematic classifications.

Yes, I believe AI is too narrow in this discussion. I suggest "Pattern Recognition" is the actual context where confusion matrices makes sense. FJValverde 09:01, 14 June 2007 (UTC)

I think they are used more generally in statistics, be it for pattern recognition or earth observation. --Ben T/C 07:41, 20 June 2007 (UTC)

Er... In my very limited historical view of either statistics and PR, the latter actually sprung from the former, but has since gained some independence: not all techniques in PR are statistical (or even probabilistic). However, I think that confusion matrix is properly a PR concept in the sense that a n-to-m classifier is a very basic PR task. In this sense earth observation and "thematic classification" (meaning classifying the type of soil & such based on the images taken by satellites, right?) is strictly a type of PR task. --FJValverde 08:47, 22 June 2007 (UTC)

Missing Labeling of Matrix Columns/Rows
Please add labels to the matrix which ones are the actual values and what are the predicted values. Reading the text it becomes clear, but please take note that the article about Receiver Operating Characteristic links to here and over there the confusion matrix is transposed (but labeled). Stevemiller 04:30, 9 October 2007 (UTC)

Accuracy
We badly need clarification for the definition of producer's and user's accuracy, which is closely associated with the confusion matrix. Comment added by Ctzcheng (talk • contribs) 17:26, 10 March 2008 (UTC)

Readability
The line "Each column of the matrix represents the instances in a predicted class" does not correspond with the figures which seem to have the True Classes in rows and the Predicted Classes in columns. This seems a bit misleading.. —Preceding unsigned comment added by 128.40.231.243 (talk) 12:15, 1 July 2009 (UTC)

The many colors in the contingency matrix are very distracting. ("Tables in crazy colours are hard to read." Formatting of the table should be simplified so that the 4 cells in the intersection of "True Condition" and "Predicted Condition" are prevalent and stand out from the adjunct information in the other cells, specifically those with the formulas. Reduce or eliminate the multiple colors. AEnw (talk) 08:18, 27 December 2015 (UTC)Aenw

remove paragraph
The paragraph that starts: "When a data set is unbalanced..." should probably be removed. I believe this is more a general property of classification algorithms rather than a property of this visualization technique. BAxelrod (talk) 19:23, 16 May 2011 (UTC)

Contingency table?
Isn't this the same as Contingency table? I understand that different fields have different jargon, but I still feel that the similarity should be acknowledged. 82.181.42.45 (talk) 18:58, 1 November 2011 (UTC)
 * I agree, I just asked the same question here: Talk:Contingency table pgr94 (talk) 13:45, 13 June 2013 (UTC)
 * I disagree. A confusion matrix can be considered a special kind of contingency table (between real and observed value), but I don't think they should be... confused. -- RFST (talk) 06:31, 28 March 2016 (UTC)

Mixed conventions
The introduction states that the columns are the predicted class and the rows are the actual class. In the Example section this convention is reversed without acknowledgement. In the Table of confusion section the originally stated convention is used. I propose the introduction note that there are multiple conventions, but then a consistent convention is used within the article. Doug Paul (talk) 04:07, 29 April 2012 (UTC)

Normalization
Normalization of confusion matrix should also be explained. — Preceding unsigned comment added by Scls19fr (talk • contribs) 12:52, 26 April 2015 (UTC)

Confusion Table and True Negatives
I would have expected the outcome of the confusion table to have 14 true negatives, because it is stated that these are correctly predicted non cats. So (3 dogs + 11 rabbits) which are correctly predicted non cats. However I can see the argument for just adding all the ((TP+FP+FN) - rest of instances). Because these are animals, that are not fn, fp or tp cats. Do we have some reference to a formula on how TN is defined more precisely?

(Jogoe12 (talk) 18:46, 19 December 2016 (UTC)).

Confusion Matrix is Transposed compared to standard practice
The reference https://link.springer.com/content/pdf/10.1023%2FA%3A1017181826899.pdf (https://doi.org/10.1023/A:1017181826899) from 1998 defines the confusion matrix with rows being ground truth and columns being predicted values. It is confusing that wikipedia uses the opposite convention, both here and in ROC. Matlab also uses the opposite convention. Lenhamey (talk) 02:31, 2 May 2019 (UTC)

I agree that it would be better to use the convention with the true values on the rows. It not only has better agreement with the literature, but also is adopted in the widely used Python library scikit-learn, so it is bound to appear more and more everywhere, from scientific publications to blog posts to business meetings. Below is a sample of machine learning and statistics books that adopt the true-on-rows convention (the third one being a famous statistics book): Danilosilva128 (talk) 14:31, 5 May 2021 (UTC)
 * 1) D. Michie, D. J. Spiegelhalter, and C. C. Taylor, Eds., Machine learning, neural and statistical classification. New York: Ellis Horwood, 1994.
 * 2) C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval. New York: Cambridge University Press, 2008.
 * 3) T. Hastie, R. Tibshirani, and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, 2nd ed. New York, NY: Springer, 2009.
 * 4) N. Japkowicz and M. Shah, Evaluating Learning Algorithms: a classification perspective. Cambridge ; New York: Cambridge University Press, 2011.
 * 5) S. Marsland, Machine Learning: An Algorithmic Perspective, 1st ed. Chapman and Hall/CRC, 2011.
 * 6) P. A. Flach, Machine learning: the art and science of algorithms that make sense of data. Cambridge ; New York: Cambridge University Press, 2012.

What do you think User:cmglee? Danilosilva128 (talk) 14:41, 5 May 2021 (UTC)
 * Thanks for checking with me, though I'm not an authority on this. I actually prefer switching as it makes Template:diagonal split header look neater. That said, I haven't seen a style guide that mandates it one way or the other. I'd be careful about stating that the ground-truth-as-rows convention is standard purely based on the list of statistics books: I haven't done a search, but might it be possible that comparably famous books not on the list have the opposite convention?
 * The Springer link states:

A matrix showing the predicted and actual classifications. A confusion matrix is of size l &times; l, where l is the number of different label values. The following confusion matrix is for l = 2:
 * followed by the matrix. It does not, however, state that that is the standard convention, the matrix could be merely an example.
 * Also, confusion matrices are in many articles, and unless we can catch every instance, I think inconsistency is worse. My 2p, cm&#610;&#671;ee&#9094;&#964;a&#671;&#954; 15:05, 5 May 2021 (UTC)
 * P.S. Asked at Reference_desk/Mathematics.


 * I don't think there is a "standard convention": this is one of those concepts that every author can choose their preferred notation. It is just that most publications and software packages that I know of use the ground-truth-as-rows notation. I believe it would be better for Wikipedia to follow what is most commonly used; it would avoid confusion, especially for the beginner. I agree that inconsistency is undesirable, so it would be important to catch every instance, but I think it is worth the effort.
 * What may have happened is that the first Wikipedia article on that topic used a certain notation (arbitrarily) and since then every new article has used the same notation for consistency rather than to reflect common practice.
 * Of course it is possible that my sampling is biased and the most common is actually the predicted-as-rows notation, though I doubt it. I did a bit more searching and found a few other references (available online):
 * Probabilistic Machine Learning: An Introduction by Kevin P. Murphy, 2021
 * J. Watt, R. Borhani, and A. Katsaggelos, Machine learning refined: foundations, algorithms, and applications, 2nd ed. Cambridge University Press, 2020.
 * Encyclopedia of Machine Learning, 2011 (cited in the same Wikipedia article)
 * Principles of Data Mining, 2007
 * https://www.statisticshowto.com/confusion-matrix
 * https://stats.stackexchange.com/questions/77532/which-notation-for-confusion-matrix-is-more-accurate
 * So far I am not aware of any reputable book on statistics or machine learning that use the predicted-as-rows notation. Danilosilva128 (talk) 18:12, 5 May 2021 (UTC)


 * Thanks for finding more books. Feedback on Reference_desk/Mathematics concurs with your observation that there is no "standard", but it does seem that ground-truth-as-rows is more common. Your explanation is sound, though another possibility is that an early editor learnt the opposite convention which might be standard in his or her location.


 * If you're still interested to work on this, I think the way forward is to compile a list of articles (including diagrams affected) so that at least half, if not most of the matrices are changed at one go, say, within a day or weekend. Otherwise, editors will just revert changes due to the inconsistency issue discussed. This seems an ideal task for an Edit-a-thon if you can interest a local Wikimedia chapter. I'll be glad to help out.


 * In the meantime, please add links to any affected articles you find to Talk:Confusion_matrix/Operation_Transposition.

cm&#610;&#671;ee&#9094;&#964;a&#671;&#954; 00:39, 7 May 2021 (UTC)
 * Thanks,


 * Just for reference (in case anyone comes here to understand the reasoning behind), here is an additional reason to adopt the ground-truth-as-rows notation:
 * Left comes before right when reading and ground truth comes before prediction temporally; so labeling ground truth on the left keeps this agreement. In particular, it is easier to read the rows of a matrix rather than the columns (since that's how we normally read) and with the ground-truth-as-rows notation this corresponds to the most natural (causal) conditioning: considering only the negative cases, how many predictions were negative and how many were positive? Etc.
 * Danilosilva128 (talk) 17:51, 8 May 2021 (UTC)


 * I am making this comment after ending up getting confused by this new standard. I think it would be nice to have a mention of the standard itself (there's nothing written about the change for example, and I lost 40 mins trying to understand why my handmade matrix and different score, made copying wikipedia, were different than the one I coded using the previous standard) + while this way of doing the confusion matrix is the one in the 1998 paper you mentioned, other exist so at least mentioning them would be nice (as scikit-learn does as a warning for example). My second comment would be about the source of the matrix itself. The first and the third use the other standard, which is confusing (I'm not sure why the second source is used). JackRed6 (talk) 10:51, 4 June 2021 (UTC)

removed one line of wrong markup at the top
Removed this

Double left curly bracket  Confusion matrix terms|recall=   Double right curly bracket

from the article today since i have no idea what it is supposed to do, and it results in a mess at the beginning of the article - Right after the title and before 'condition positive(P)' on the page I see this garbage

"Insert non-formatted text here{| class="wikitable" width=35% style="float:right;font-size:98%; margin-left:0.5em; padding:0.25em; background:#f1f5fc;" |+        Terminology and derivations    from a confusion matrix |- style="vertical-align:top;" |  "  — Preceding unsigned comment added by Thinkadoodle (talk • contribs) 15:06, 8 June 2020 (UTC)

Can you please undo this change? This table was one of the best reference for the confusion matrix and its derived metrics out there. It used to be in a grey box next to the article. --Marvmind (talk) 21:23, 8 June 2020 (UTC)

add some formula:
83.83.238.65 (talk) 11:46, 7 April 2021 (UTC)I would have liked to find the following formula on this page:

FN = (1-sensitivity) * N * prevalence

TP = sensitivity * prevalence * N

TN = (1-prevalence) * N * specificity

FP = (1-prevalence) * N - TN

As I could not find them anywhere and I had to derive them myself. (I did check for correctness against to calculator on: https://statpages.info/ctab2x2.html )

These are useful if someone else reports sensitivity, specificity and prevalence, but not the actual confusion matrix.

83.83.238.65 (talk) 11:46, 7 April 2021 (UTC)


 * ❌ As all metrics are derived from the observed values FN, TP, TN and FP, there are very many ways of expressing one metric in terms the others. Priority is given to expressing derived metrics in terms of more basic ones, as showing them all would clutter the page. The reader can use a system of equations to recalculate the observed values from derived metrics if needed. Cheers, cm&#610;&#671;ee&#9094;&#964;a&#671;&#954; 00:08, 7 May 2021 (UTC)

Is Type I error overestimation or underestimation?

 * I found conflicts in the page. The terminologies on the right (including the machine learning figure) said Type I error is underestimation whereas the below confusion table said the Type I error is overestimation. Which one is correct? — Preceding unsigned comment added by 82.47.241.115 (talk) 09:23, 16 October 2021 (UTC)

I noticed this too: the over/underestimation designations in the table does not match up with the designations in the equation sidebar. Those in the sidebar are reversed and need to be edited (I would contribute myself but I don't know how to do it); false positives are an overestimation of values (counting data as 1's when they should be 0's), and in turn, false negatives are an underestimation of values (overlooking data as 0's when they are actually 1's). --Jafonte01 (talk) 15:50, 2 November 2021 (UTC)

Proposed changes to transcluded formula template
Fellow Wikipedians: I've proposed some changes to the formula infobox transcluded into this article, with the goal of trimming down its overpowering (if not excessive) width. My original message with some explanatory notes is at Template talk:Confusion matrix terms, and you can see the revised template layout I've proposed by viewing its sandbox version.

There have been no responses over there in well over two months, and since the changes I'm proposing are significant enough to possibly be contentious, I wanted to invite any interested Wikipedians to discuss them over at the template's talk page. Thanks! FeRDNYC (talk) 00:12, 5 January 2022 (UTC)

Inclusion of the General Performance Score
Dear collaborators,

I've recently read a paper about a General Performance Score for classification problems. It is indeed a generalized performance measure defined as the harmonic mean of the selected and desired classic performance measures (for example, recall, specificity, etc.). It is totally adaptable and the classic performance measures are a particular case of it. I have created the following draft to include it in wikipedia with the objective of including it later in the confusion matrix page and template:

Draft:General Performance Score (GPS): a general metric to evaluate classification problems

RanchoLancho (talk) 17:32, 26 January 2023 (UTC)