Talk:Receiver operating characteristic

[Introduction under-justifying and over-claiming]

I don’t like the final two sentences of the intro:

1. “ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution.”

Guessing at random provides a way of selecting “possibly” optimal models. Why is any ROC-based method preferred? What evidence is there that the ROC-based process leads to superior outcomes?

2. “ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.”

It’s not clear what form this relationship takes, and the subsequent article doesn’t explain it. So it seems like a vague form of over-claiming to me. — Preceding unsigned comment added by Willbown (talk • contribs) 06:38, 16 July 2023 (UTC)

Area under the curve
The minus sign in the derivation of the AUC right before the derivative of the FPR seems to be wrong. Let $$\widetilde{\mbox{TPR}}(\mbox{FPR})$$ be the True Positive Rate as function of the False Positive Rate. The AUC is then given by:
 * $$ A = \int_{0\%}^{100\%} \widetilde{\mbox{TPR}}(\mbox{FPR}) \, d\mbox{FPR} \,. $$

Let $$\mbox{FPR}(T)$$ be a parametrization of $$\mbox{FPR}$$ with $$ \mbox{FPR}(-\infty)=100\%$$ and $$ \mbox{FPR}(\infty)=0\%$$. According to Integration by substitution:
 * $$ A = \int_{\infty}^{-\infty} \widetilde{\mbox{TPR}}(\mbox{FPR}(T)) \mbox{FPR}'(T) \, dT

= \int_{-\infty}^{\infty} \mbox{TPR}(T) (-\mbox{FPR}'(T)) \, dT = \int_{-\infty}^{\infty} \mbox{TPR}(T) \frac{d}{dT} (1-\mbox{FPR}(T)) \, dT  \,.$$ In terms of the True Negative Rate and its density:
 * $$ A = \int_{-\infty}^{\infty} \mbox{TPR}(T) \mbox{TNR}'(T) \, dT

= \int_{-\infty}^{\infty} \left[\int_{T}^{\infty} f_1(T') \, dT' \right] f_0(T) \, dT = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} I(T'>T)f_1(T') f_0(T) \, dT' \, dT = P(X_1 > X_0) $$

Clinical scenarios
Can anyone add information about the use of this method in clinical scenarios (eg. examination of risk factors for disease outcomes)? —Preceding unsigned comment added by 131.104.10.194 (talk • contribs) 23:08, 28 September 2006 (UTC)


 * The Guyatt et al. paper on iron-deficiency anemia is a classic.
 * Guyatt G, Patterson C, Ali M, Singer J, Levine M, Turpie I, Meyer R (1990). "Diagnosis of iron-deficiency anemia in the elderly.". Am J Med 88 (3): 205-9..
 * You're welcome to add an example you've come across (this is the encyclopedia anyone can edit)... or write a section after digesting Guyatt's paper -- which makes use of the concept. Nephron T|C 19:00, 14 December 2006 (UTC)

Merging of articles
I suggest that the articles "Receiver operating characteristic" and "Detection theory" (Signal detection theory) should be merged. The merged article should seek a middle path in terms of technical formality and jargon (as simple as possible, but not simpler). Merged or not, the two articles should be more clearly compatible, if not entirely consistent, since they are basically trying to explain the same underlying thing. Rbfuld (talk) 23:02, 27 January 2008 (UTC)rbfuld

I agree with the suggestion that the articles ROC and "Reciever operator characteristic" should be merged. Wicked Maven 20:00, 28 January 2007 (UTC)

I disagree with the suggestion that "Receiver operating characteristic" and "Detection theory" (Signal detection theory) should be merged. Signal detection theory appears to be a larger topic containing many concepts and methods. The combined page would have to be huge in order to treat all of them in sufficient detail. It is easier to maintain a collection of small pages that treat each component of the topic. Anecdote: I arrived at this page seeking to clarify my understanding of ROC curves, and in particular, the use of the area under the ROC curve. I already know about most of the stuff on the current Detection Theory page, and was only seeking information relating to ROC curves. This page is the correct level of granularity for what I wanted to learn. Note: Maven is suggesting that "ROC" be merged with this article? But it seems there is no other ROC article on this topic; the ROC disambiguation page points here. Bayle Shanks (talk) 22:01, 6 November 2009 (UTC)


 * I agree with this; detection theory applies the far more than just radar. Consider SONAR. Detection theory definitley applies to this, however it is certainly not radar. Balluwun-enjoyer (talk) 05:50, 20 May 2023 (UTC)

disappointed
For whom was this written? The author displays considerable erudition, but no desire to make his subject palatable to beginners. The second paragraph was enough to choke and die on. I'm not here to be stymied by a pedant - I need to understand ROC curves. I swear - if I ever learn enough about the subject to do so, I will join Wikipedia and rewrite this #$&* entry.64.59.144.85 02:55, 8 February 2007 (UTC)


 * I am in complete agreement with the preceding comment of someone else 13 years earlier!!! The 4 paragraphs under "Basic Concept" are almost completely unrelated to each other; none of the paragraphs finish what they set out to do, and they are limited to the relatively simple concepts of true and false positives and negatives, essentially making no connection to the nature of an ROC graph.  The figure illustrating the probability of proteasome cleavage isn't even referenced once!  This is still a #$&* entry. Verytas (talk) 00:19, 23 March 2020 (UTC)


 * I second that! I came here to understand what is a better or worse ROC curve, and after having spent 1 hour of reading through the article (and related ones) I still don't have the slightest idea of how to use this for something practical. The lack of human language without tech-jargon is jaw-dropping. Jahibadkaret (talk) 17:01, 25 October 2020 (UTC)
 * This is a common issue with wikipedia math articles. They can be surprisingly jargon-heavy and unfocused. I would recommend a good statistics textbook for learning. Wqwt (talk) 16:17, 19 October 2023 (UTC)

ROC example misleading or wrong
I think that the example plot w/ points A, B, C, C' is misleading or wrong. C' is intended (I think) to be an example of the effects of inverting the output of the worse-than-random classifier C. If this is actually what it's meant to represent, the plot is wrong: inverting the output of the classifier doesn't correspond to a mirror reflection across the diagonal, but to a mirroring through the point (0.5,0.5). (Inverting the test set labels correponds to a mirror reflection across the diagonal.) I don't have the file used to create the diagram, or I would fix it myself, so I leave this to whoever posted the diagram. 128.62.104.34 22:27, 23 April 2007 (UTC)


 * Hmm, mirroring with the point (0.5, 0.5) is not the same as mirroring with the diagonal line. I believe the example means that the output of worse than random classifier can be simply mirrored with the diagonal line to get point above the diagonal line. In the table you can see that C' is an invert classification of C. Perhaps you can read the source here: . &mdash; Indon ( reply ) &mdash; 08:09, 24 April 2007 (UTC)


 * I believe the critique above is correct. Reading your cited source confirms that if you read carefully. Fawcett has the mirroring wrong when he says it is across the diagonal, though his explanation of reversing the decision is correct. that is, true positives become false negatives and vice versa. The contingency table on the wiki page for C' is not an inverted classification of C Because the inversion must occur on the columns in the example and not the rows due to the conjunctive equations. Reversing all decisions would then swap the values in the first row with the values in the second row. Therefore, the mirroring is through the point (0.5,0.5), and a true C' which is a reversed decision of C would be at the point (.12,.76) which is still "better than" pt. A. The explanation after the words about mirroring are what is causing confusion, but it would be useful to inform the reader how the mirroring really takes place. Snthor 15:19, 9 August 2007 (UTC)

No matter whether we have to mirror with the point (0.5, 0.5) or at the diagonal line, if we agree that any point under the diagonal line can be mirrored onto the other side then the lower right corner also represents a perfect classification. Therefore IMHO the two arrows labeled "better" and "worse" are misleading too. The closer towards the diagonal line, the worse; the closer towards either the left top corner or the right bottom corner, the better. One solution might be making both arrows two-headed. The heads pointing towards the edges should be labeled "better", the heads pointing towards the diagonal line should be labeled "worse". I am just afraid that people new to the topic will still be confused. Different approach: points under the diagonal line must be mirrored first before they can be compared. Put only one two-headed arrow in the upper triangle. Stevemiller 03:58, 9 October 2007 (UTC)


 * The lower right triangle corresponds to worse-than-random classifiers, so imho the arrow labels are correct. One might add arrows labeled "higher classification power" pointing away from the diagonal. I agree that the presented contingency table for C' is incorrect, instead of interchanging the columns one has to interchange the rows, since the row index represents the suggested classification. When calculating the TPR and FPR for the modified matrix, one finds that TPR changes to 1-TPR and FPR changes to 1-FPR, so also the presented numbers below the matrix are wrong, as well as the position of C' in the plot. The replacement of (x,y) by (1-x,1-y) corresponds to a mirroring at (0.5, 0.5). Kero6581 (talk) 08:12, 6 May 2009 (UTC)


 * ok, I corrected the text so that the squares are now correct. The only thing left is to correct the diagram such that the point C' is at (.12,.76). I hope Indon still has his original file, that would make it much easier to correct the figure. Greetings --hroest 12:08, 10 July 2009 (UTC)


 * Ok, new figure done. Contact me if modifications are necessary. Kai walz (talk) 20:57, 8 November 2009 (UTC)

"The lower right triangle corresponds to worse-than-random classifiers" Isn't a random classifier (the diagonal line) the worst case? Suppose our classifier is 100% of the time wrong, IOW it is at the right bottom point, then we actually have a perfect classifier, which has just the wrong labels. We would always flip a classifier that is below the diagonal line (AUC<0.5). I understand that with "behaving" normal distributions the ROC won't cross the diagonal line, switching sides. But other distributions might cause the ROC to cross the diagonal line. Stevemiller (talk) 17:36, 26 February 2022 (UTC)

Inconsistent Notation
The notation used in the figure ("How a ROC curve can be interpreted") in the fourth section ("Further interpretations") is inconsistent with the notation introduced in the first two sections ("Basic concept" and "ROC Space"). The figure uses TP, FP, TN and FN instead of TPR, FPR, TNR and FNR. Aside from being confusing, it is actually misleading since TP, FP, etc. were already introduced in the earlier sections as having different meanings than TPR, FPR, etc. What is more, the figure's notation isn't even internally consistent. The axis labels on the ROC graph should be "TP" and "FP", not "P(TP)" and "P(FP)". Alternatively, to show explicit dependence of the true positive rate and false positive rate on the threshold value, the axis labels could be "TP(θ)" and "FP(θ)", where the threshold value θ needs then to be introduced in the graph of the probability density curves for the detection statistic. And while I'm picking nits, why aren't the axes of the probability density graph labelled, and for that matter, why don't any of the three subfigures in this image have titles?

Don't get me wrong, I don't want to get rid of this image. For me it is the one illustration that allowed me to "get" what the ROC curve actually quantifies. Which is why I think it is important that it be brought into conformance with the rest of the article. I can see from the article's history that the image itself predates the discussion in the "Basic concept" and "ROC space" sections, so I imagine that the original creator of the image (Kku?) might be resistant to having it replaced with an updated version. However, our collective goal is for the overall article to be as clear as possible, and the best way that I can see to do that is to maintain the notation of the "Basic concept" section and to update the figure accordingly.

Here are the changes that I would propose to the figure: - Titles for each of the three subfigures (these could be placed in the figure caption as long   as the subfigures are labelled with a), b) and c)) - Axis labels where appropriate - Replace TP, FP, etc. with TPR, FPR, etc. - Introduce θ as the threshold value and replace P(TP) and P(FP) with TPR(θ) and FPR(θ)

One other thing worth mentioning on the topic of consistency, is that the confusion matrix in the figure is of a different form than that introduced in the "Basic concept" section, having its columns sum to 1 rather than to the respective probabilities of the underlying event occurring or not. I don't think that this should be changed in the name of consistency, however, because as it is, it provides a direct link between the two other subfigures in the image. If the confusion matrix were altered so that the columns sum respectively to P and N, then this link would be lost and the subfigure would only serve to introduce (dare I say it?) confusion.

Personally, if effort will be taken to update this figure, I think it might be wortwhile to introduce one more subfigure at the top showing the information flow (underlying two-state process --> observable data --> detection statistic --> decision), but this may not be the best choice of language if my stated goal is to enforce consistency with the rest of the article.

JanRu 20:26, 26 April 2007 (UTC)

ROC space and metrics
In the section, "ROC Space", the info-box is referenced as containing evaluation metrics. Perhaps inadvertently, the word metric is hyperlinked to the wiki page on metrics, as in metric space distances. The reader may be inclined to believe from this that the info-box contains metrics. This is not the case. None of the "evaluation" metrics listed are true metrics in the mathematical sense. I would recommend deleting the hyperlink.
 * I agree, I removed that link Bayle Shanks (talk) 22:12, 6 November 2009 (UTC)

In a similar discussion, the notion of a ROC space is incorrect. What is meant by space? It is neither a vector space nor a topological space and so the verbiage is abused, even though it appears in some of the cited literature. A ROC graph is what is presented and its limitations are made clear, but the notion of a space is ill advised. I recommend titling the section as ROC graphs.

Snthor 14:21, 9 August 2007 (UTC)

d' (d-prime)
The article says about d' "... under the assumption that both these distributions are normal with the SAME standard deviation" (my emphasis). But the article about d' uses "the standard deviation of the noise distribution". Stevemiller 04:46, 10 October 2007 (UTC)

Number of observations - irrelevant?
Does the number of observations affect the ROC curve at all? With only one observation isn't it possible to have 100% sensitivity (no false negatives) and 100% specificity (no false positives)? Presumably I'm missing something because if that's the case having a good point on a ROC curve doesn't guarantee a good classifier. pgr94 (talk) 18:55, 24 January 2009 (UTC)

You'd have a good ROC curve, but no statistical reason to believe that this curve is representative of actual behaviour. --60.234.219.72 (talk) 01:27, 25 August 2009 (UTC)

Terminology and derivations from a confusion matrix
This is an excellent addition to this article - very helpful for people wanting to dive deeper. Thanks so much. 128.220.160.6 (talk) 00:48, 9 March 2009 (UTC)


 * Perhaps we should move this table into either the Confusion matrix or Binary classification article. As of right now, various concepts such as sensitivity and specificity, positive and negative predictive value, and accuracy each have their own articles, each repeating the same information about the various relationships between true positives and false positives. Perhaps we should consolidate some of this information into a more comprehensive article on a more general/introductory page. Many of these basic definitions (specificity, selectivity, positive/negative predictive value) are useful basic information and right now the only place to find a handy table is in the Receiver operating characteristic page, which is a little too obscure/advanced to warrant being the main location for this information. The sensitivity and specificity page has already started to become a more general page: unlike positive and negative predictive value which have separate articles, sensitivity and specificity has attempted to introduce much of this terminology together. For now I have added this table to the sensitivity and specificity article. 171.64.15.56 (talk) 17:51, 30 October 2009 (UTC)

what does eqv. stand for????
 * I assume "eqv." means "equivalent" 171.64.15.56 (talk) 17:44, 30 October 2009 (UTC)

Math Parser Error
From Revision #314192656 by 151.148.122.100 " Failed to parse (unknown function\MCC): \MCC = (TPTN - FPFN)/ \sqrt{P N P' N'} " ... reverted to working formula, however, the formula is rendered as a PNG - anyone who knows how to enforce text rendering, please be my guest. - Dlefree-loc-work (talk) 08:55, 21 September 2009 (UTC)

Positiveness
Perhaps something should be added to note that the further you move towards the upper-right in the ROC graph, the more often that the classifier gives you a positive answer. So, I think that movement in a diagonal direction towards the upper right corresponds to biasing the classifier to return a positive answer more often, without improving its accuracy. I'd add this myself, but I'd like someone more knowledgable to double-check it first. Bayle Shanks (talk) 22:07, 6 November 2009 (UTC)


 * The holy grail is the upper left, not the upper right. You are correct about the upper right.  It's always easy to get the upper right: Just label every point as a "yes".  (Or lower left, for "no".)  All ROCs pass through those two points.  If you think the average reader would benefit, and you would like to add a few words to that effect, do it! Jmacwiki (talk) 15:07, 10 July 2011 (UTC)

"Lift Curve"
ROC curve is also called a "lift curve" according to the book "Mastering Data Mining" by Berry and Linoff. —Preceding unsigned comment added by AndrewHZ (talk • contribs) 03:48, 6 December 2009 (UTC)


 * Yes, in data mining the same approach is used to indicate the impact of using a predictive model in a real world marketing environment. It is known as a lift curve or a gains curve, and somewhat less often as an ROC curve.  Duncan (talk) 15:16, 10 December 2009 (UTC)

I thought that a lift chart had a different, but similar, X-axis than an ROC curve. The x-axis is the false positive rate in an ROC curve but it is the subset size (% of data tested) in a lift chart (see Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank). Mickeyg13 (talk) 22:27, 9 June 2010 (UTC)

Discrimination summary statistic
Just gone looking for a source for the following summary statistic:


 * 1) the area between the ROC curve and the no-discrimination line

If you've got a reference for this one, it'd be much appreciated. —Preceding unsigned comment added by Noogz (talk • contribs) 06:23, 8 March 2010 (UTC)


 * If I understand the question, this is the same as the Gini coefficient, which is already referenced in the article. (Or maybe 1-Gini, I don't recall.) Jmacwiki (talk) 15:09, 10 July 2011 (UTC)

Is the ROC Curve really a curve and is the AUC a meaningful measure ?
As I understand, the ROC Curve is created by plotting quotients of integers against each other. But this means, that on both dimensions of the ROC space the irrational numbers do not have a ROC point. But if this is the case, the ROC "Curve" is defined for pairs of rational numbers only. But if this is the case the ROC curve is majorised by the Dirichlet function, which has a Lebesgue measure of 0. Thus the area under curve for a ROC "curve" is at most 0, if it ever exists. So in consequence, this means, that the AUC increases with every additional data point, however the convergence of the appropriate "measure" for the area under the curve is by no means guarantueed. It follows, that the AUC is a meaningless number, because it will rise with any additional observation and might achieve any number that wiill be given, supposed that enough observations for the classifier are available. Please point out my error, and I would happily accept that I am wrong. 80.153.50.105 (talk) 14:33, 4 June 2010 (UTC)


 * Good question, a reference would help. It could be a misnomer, unless you accept all plots are curves. ROC's are frequency based, so the assumption must be the frequencies are continuous probability functions.  The sample size is relevant.  The AUC is like a performance index itself, so as long as it correlates, as a practical matter, it has discriminatory meaning.  However, that meaning can be over interpreted; because, the AUC does not account for economic utility or the dreaded Type 3 error.  Too many focus in increasing AUC performance and neglect increasing the economic efficiency of a diagnostic. Zulu Papa 5 * (talk) 14:48, 4 June 2010 (UTC)


 * In Decision Curve Analysis, The "Net Benefit" is an simple meaningful alternative performance measure to the AUC. It is NB = (True Positives - (w)(False Positives))/ N  where w is the economic ratio of (Good / (1-Good)) .  Zulu Papa 5 * (talk) 15:42, 4 June 2010 (UTC)


 * As the original writer: Why does authority help with a mathematical argument? Even if there are a lot of possible quotations, this does not establish a logically true argument. In other words, authority does not replace logic. except for the case of machine learning may be. 92.74.122.0 (talk) 21:04, 4 June 2010 (UTC)
 * Ok .. well we probably should not go off topic; however, I believe math and reality is defined by authoritative convention, and well ... how it progresses from there can be delusional. Besides, the Wikipedia authorities require verification without WP:SYN except for the most nominal and trivial math calculations.  Original Research must go some where else, like a blog, to have a voice. Zulu Papa 5 * (talk) 21:19, 4 June 2010 (UTC)
 * I think ZuloPapa missed IP's point. It doesn't have anything to do with actual economic benefit; it's purely a mathematical point (and even though Wikipedia requires sources, mathematics does not).  If your datapoints are empirically derived, yes of course strictly speaking the Lebesgue measure of the support of an ROC curve is 0 and it's not technically a continuous curve.  That doesn't mean that it is such a terrible thing to calculate the area using some sort of interpolation.  Also if you really feel like getting into the esoterically technical, a finite sample of say false positives is just a sample of the underlying "actual" false positive rate.  Although the samples will always be rational, it is conceivable that the underlying false positive is irrational.  So there could be some process with a nonzero true positive rate for all false positive rates in (0, 1), yielding a continuous, Lebesgue integrable function.  However, our finite sampling will not reflect that, so we interpolate.  That we must interpolate for real-world measurements does not render the concept meaningless.  Mickeyg13 (talk) 18:54, 31 August 2010 (UTC)
 * I am having some trouble understanding the topic -- what is the continuous variable in this space to make the curve a "curve"? One cannot alter the FP rate directly, so there must be some "hidden" parameter. User A1 (talk) 20:58, 10 August 2010 (UTC)
 * A very very brief scan would indicate the OP is correct, and the curve is not a curve at all, and according to this interpretation of the AUC is tricky. I'll not pretend to understand all this. User A1 (talk) 21:04, 10 August 2010 (UTC)
 * If the data are being modeled as coming from two continuous distributions, then the ROC curve can actually be calculated and is a continuous curve. In other situations with discrete variables or when finitely sampling from distributions, you can interpolate to make a curve and estimate the area under it, so no big deal. It's still useful even if it's not a "curve" in the technical sense.

Which perpendicular line?
Under the Further Interpretations section, a statistic is defined as: "the intercept of the ROC curve with the line at 90 degrees to the no-discrimination line"

But there are an infinite number of lines perpendicular to any line. So this intercept can arrive at any point. Is this meant to be more specifically the intercept of the ROC curve with a line at 90 degrees to the no-discrimination line intersecting at its midpoint? —Preceding unsigned comment added by 32.166.60.40 (talk) 16:01, 4 November 2010 (UTC)


 * Interesting question ... "the intercept" would define a point along the no-discrimination line. This could be a normalized statistic of the ROC curve, such that you could fit nearly infinite ROC curves by knowing the no-discrimination line point. I doubt many have explored this concept. Would be good to search on "ROC curve no-discrimination lines", to find sources. Zulu Papa 5 * (talk) 19:53, 4 November 2010 (UTC)


 * Most likely whoever wrote that meant the line going from (0,1) to (1,0). But actually any of the lines perpendicular to the diagonal can give a summary statistic with equivalent information, it's just that the (0,1) -> (1,0) line is probably the most useful. The line going through (0.9,0.9), for instance, will have an intercept that changes very little as the curve changes. —Preceding unsigned comment added by 143.48.93.115 (talk) 04:09, 5 December 2010 (UTC)


 * This makes sense. (It's nice to have a name for it -- I just always thought of it as the "100%" value, since it's the unique point whose coordinates sum to 1.)


 * Note that there is another distinguished point: the point of maximum (perpendicular) deviation from the no-discrimination line. Its Euclidean distance from the line is the Kolmogorov statistic D, divided by sqrt(2).  (Equivalently, its distance measured in the max-norm is D, and in the sum-of-coordinates norm is D/2.)


 * This observation addresses another point raised on this page: The Kolmogorov-Smirnov test, which compares two populations of samples -- the two axes here -- yields a probability after combining D with the number of observations (roughly, D*sqrt(N)). As a result, the test properly recognizes that a single-point ROC cannot yield a discriminator in which you can have any confidence. Jmacwiki (talk) 15:27, 10 July 2011 (UTC)

Cut offs
How about discussing the meaning and use of cut offs as illustrated here? (The cut offs are the labels 0.1, 0.2, etc on the curve.) AndrewHZ (talk) 16:55, 6 December 2010 (UTC)

incorrect definition of false positive rate
The false positive rate is the conditional probability that a person truly has a disease given that they test positive. In other words, P(D|+). In signal detection theory there is something called the false alarm rate, which is being incorrectly used in this article as the false positive rate. This is all made clear in Statistical Methods For Rates And Proportions by David L. Fleiss.

This misunderstanding of the false positive rate is unfortunately fairly widespread, and having it misstated in this article is no help. —Preceding unsigned comment added by 12.190.115.98 (talk) 14:06, 24 January 2011 (UTC)
 * No, the the conditional probability that a person truly has a disease given that they test positive is the positive predictive value. The false positive rate is the conditional probability that a person tests positive given that they don't have the disease. --Qwfp (talk) 14:36, 24 January 2011 (UTC)
 * Quite so. And the latter is identical to the false alarm rate.  (The names really aren't opaque.  They mean what they say, and they say equivalent things!) Jmacwiki (talk) 07:04, 2 February 2013 (UTC)

ref
Is there a ref for this thanks. Zulu Papa 5 * (talk) 02:10, 7 March 2011 (UTC)

Confusing matrix
Just noted that the Confusion matrix page gives the columns as predicted values and rows as actual values, whereas the confusing matrix used on this page has swapped the meaning of columns and rows. —Preceding unsigned comment added by 80.216.132.209 (talk) 20:49, 10 April 2011 (UTC)

Z-transformation
"If a z-transformation is applied to the ROC curve, the curve will be transformed into a straight line." - How so? I don't see how that can follow, in the general case. (I assume we're talking about z-score and not the Z transform.) Converting the data to z-scores is a linear transform. --mcld (talk) 09:37, 23 November 2011 (UTC)
 * That section headed Z-transformation was added on 9 May 2011 by 68.180.102.87, who has no other edits. I can't follow it either. If no-one comes forward to offer to clarify it, I suggest we delete the section. Qwfp (talk) 12:27, 23 November 2011 (UTC)
 * The sections Detection error tradeoff graph and Z-transformation were inserted at the wrong place, interrupting the original Further interpretations section. I've moved them so that the Area under curve and Other measures subsections are filed correctly under Further interpretations.
 * The "Z-transformation" might be a misnomer. It doesn't refer to the Z-transform in signal processing. Though somewhat related to the Z-score, it's not the same thing either. The transformation is a warping of the axes by the inverse of the normal cumulative distribution function $$\Phi$$, so that 0.5 becomes 0, etc. But I'm not ready to edit it because I don't know where the name "zROC" came from.
 * Since there is already a page for the Detection error tradeoff curve in Wikipedia, I suggest merging the sections Detection error tradeoff graph and Z-transformation to that page. MaigoAkisame (talk) 20:02, 8 August 2012 (UTC)


 * Re: "z score" vs. "z transform" This is overly pedantic. It is true that "z-transform" is used differently in discrete signal processing.  However "z-transform" is also regularly used this way in statistics.  The two fields use the term differently, and Wikipedia is not going to define that behavior out of existance, nor should it.  The term "Z transform" should be retained here; "Z standardization" is much less widely used.  134.174.140.176 (talk)A statistical signal processing expert  —Preceding undated comment added 16:56, 19 July 2013 (UTC)

Mistakes
Mistakes in:
 * A completely random guess would give a point along a diagonal line (the so-called line of no-discrimination) from the left bottom to the top right corners (regardless of the positive and negative base rates). An intuitive example of random guessing is a decision by flipping coins (heads or tails).

This should be a biased coin. And more is not good. — Preceding unsigned comment added by 129.125.21.37 (talk) 10:34, 16 April 2012 (UTC)

I came to the discussion page to look for an explanation of this, and I agree that the wording is unclear at best---it seems to suggest that for a small data set you would end up somewhere along the diagonal, and as the number of data points increases, you converge to (0.5,0.5). This is not what happens, you end up on the diagonal for large data sets, and where you end up is determined by the bias of the coin. --passerby — Preceding unsigned comment added by 24.6.143.43 (talk) 22:08, 9 July 2014 (UTC)

Misinterpretation of epitope detection result?
In the section "Other measures": The graph shows that if one detects at least 60% of the epitopes in a virus protein, at least 30% of the output is falsely marked as epitopes. I wonder if the second half of the sentence is wrong -- we should say "at least 30% of non-epitopes are falsely detected". MaigoAkisame (talk) 19:41, 8 August 2012 (UTC)

Threshold choice
Can someone add some comments/references on the choice of the threshold? Indeed, after measuring the ROC curve of a system, one may want to optimize the system by choosing a specific point on the curve, i.e. a specific threshold, either by:
 * minimizing the distance with the upper-left corner
 * maximizing the distance with the random line (orthogonal projection)
 * staying below a given FPR or above a given TPR (to comply with project goals)

Lagaffe (talk) 10:42, 12 November 2012 (UTC)

Undefined Variables in Formula
The AUC section contains a formula with undefined variables, viz., X, Y and k. This is not acceptable in an article. I presume it was just copied from some text somewhere in which the definitions were given. If the author of that section is monitoring this talk page, I implore him/her to please define the variables. This is a bad (and annoying) practice and we should try to force clear definitions in every instance on wikipedia. Formulas are nearly useless without the definitions of their elements. Chafe66 (talk) 22:21, 30 January 2013 (UTC)

Bad link for Z-transformation?
In the paragraph headed "Z-transformation", I think the blue link on the first occurrence of the phrase "Z-transformation" is in error. The linked page seems to have nothing to do with the transformation being referred to here. The reference should be to some transformation associated with the Normal distribution. Stephen Robertson (talk) 09:42, 24 April 2013 (UTC)

Possible copyvio
The Basic concept section (which begins "A classification model (classifier or diagnosis) is...") contains a piece of text identical to text I've found in page 3 of this document. That document has a publishing date of 2003, while the text here seems to have been added in 2007 by this edit. As a side note, while I initially thought the paragraphs following it were also a copyvio, they have in fact apparently been reproduced in this book published in 2010. AdventurousSquirrel (talk) 13:09, 13 January 2014 (UTC)

Etymology of "receiver operating characteristic"?
Just curious how the hell the name came about, as it seems a bit flaunty given that it's just true positives vs false positives. — Preceding unsigned comment added by Mmkstarr (talk • contribs) 00:56, 18 March 2016 (UTC)

http://www.math.utah.edu/~gamez/files/ROC-Curves.pdf See last paragraph. Hous21 (talk) 00:59, 18 March 2016 (UTC)

Criticism of AUC debunked in recent research.
The article here holds a relatively critical view of AUC for estimating classifier performance based on the paper by Hand. This research seems to be mostly superseded by the more recent paper by Flach cited later. "Small-sample precision of ROC-related estimates" explicitly talks about small sample estimates, which is not relevant for machine learning applications. Therefore paragraph seems inconsistent to me. There seems to be no current literature suggesting that AUC actually is not a good measure. Still the article goes on saying "One recent explanation of the problem with ROC AUC " which implies that there is a problem. Andreas Mueller (talk) 20:47, 29 April 2016 (UTC)

Total operating characteristic (TOC) curve as alternative to ROC
Hi,

I am new to Wikipedia, so please pardon me if this is not the correct space for this.

I would like to add information about the Total Operating Characteristic (TOC) curve. This is an alternative to the ROC curve that still provides all of the information the ROC curve presents such as area under the curve. However, it is designed slightly different so that you would be able to recreate a confusion matrix showing the hits, misses, correct rejections, and false alarms for each threshold. Dr. Robert Gil Pontius came up with the TOC curve as an improvement to the ROC curve, and he has a paper published in the International Journal of Geographical Information Science called, “The total operating characteristic to measure diagnostic ability for multiple thresholds” which explains how to construct a TOC curve and the advantages of using it over the ROC.

Should the TOC be added to the ROC article, or should it have its own article? Because we have a lot of content on the TOC curve, I think we should try to make a separate page for TOC. However, based on the little information I know about Wiki, creating a page and getting it approved is a more difficult task?

Thank you! Crobbins5 (talk) 06:55, 19 March 2018 (UTC)

Doesn't seem like there are a lot of sources for this TOC. So maybe just a mention here. Without sources, it's difficult to justify a separate article. Zulu Papa 5 * (talk) 21:00, 19 March 2018 (UTC)

This is because the journal article was published fairly recently, and scientists are used to using the ROC curve. However, 26 have already cited Pontius and Si’s article. The TOC curve provides all of the information that the ROC curve does but is more intuitive, and Wikipedia seems like the perfect platform for spreading awareness to the public so that more can benefit from it.

Crobbins5 (talk) 03:47, 20 March 2018 (UTC)

There is a year-old Wikipedia article at Total operating characteristic. --Rumping (talk) 17:13, 17 April 2018 (UTC)

Image
I just created an ROC image which might be interesting for this article:



--MartinThoma (talk) 19:53, 24 June 2018 (UTC)


 * Excellent, exactly what this article is missing. (See my comment under the dissapointed section, above) Please add this! Jahibadkaret (talk) 17:04, 25 October 2020 (UTC)

Correct curve when TPR and FPR both change at a cutoff value?
Most ROC examples show a curve in a step-wise form, i.e., at a given cutoff value either TPR or FPR change, leading to a curve that is comprised of horizontal and vertical lines which are then only "smoothed" by the number of points. If at a given cutoff value, though, both, a TRUE and and FALSE data point are classifed as positive, because for some reason they have the same predicted value, the curve should provide a diagonal line piece as I understand (and as it is implemented, e.g., in the R package ROCR). I have seen versions, though, where in this case the curve is rather created by still using horizontal and vertical line pieces, usually in a manner that a fictive cutoff point is created as if by an infinite small amount the FALSE observation is classified as positive before the TRUE observation is classified. Obviously, this will reduce the AUC in comparison to the straight (diagonal) line.

Is there a "correct" approach to this? Jenskauf (talk) 14:19, 24 May 2020 (UTC)

xkcd-style graph
and Is there a reason to have the ROC graph in a hand-drawn style? I can redraw the graph in a more conventional style if it fits the tone of the article better. Cheers, cm&#610;&#671;ee&#9094;&#964;a&#671;&#954; 16:04, 13 August 2021 (UTC)
 * that would be great! –CWenger ( ^ •  @ ) 16:15, 13 August 2021 (UTC)
 * I simply like that style :-) If you create a new one, please reference this one and upload it as a new file --MartinThoma (talk) 17:36, 19 August 2021 (UTC)
 * I like it too, just doesn't seem appropriate for an encyclopedia. –CWenger ( ^ •  @ ) 17:49, 19 August 2021 (UTC)
 * Why do you think so? --MartinThoma (talk) 16:25, 22 August 2021 (UTC)
 * The pseudo-hand-drawn squiggly lines and Comic Sansesque font give it an informal look more appropriate for a blog post than an encyclopedia article. 19:50, 22 August 2021 (UTC)
 * ✅ Thanks, cm&#610;&#671;ee&#9094;&#964;a&#671;&#954; 20:36, 5 September 2021 (UTC)
 * Looks great, thanks! –CWenger ( ^ •  @ ) 21:38, 5 September 2021 (UTC)
 * Hi, I opened an account just for this. I made some gif animations a while ago that vividly show how the graph is extracted and I hope they can be used here. Not sure how to share them here, so I would appreciate help. Iliumm (talk) 10:10, 24 April 2023 (UTC)

Lead reduction
Would anyone object if I simply booted this entire paragraph out of the lead section? "The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection in machine learning. The false-positive rate is also known as probability of false alarm and can be calculated as (1 − specificity). It can also be thought of as a plot of the power as a function of the Type I Error of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity or recall as a function of fall-out. In general, if the probability distributions for both detection and false alarm are known, the ROC curve can be generated by plotting the cumulative distribution function (area under the probability distribution from  to the discrimination threshold) of the detection probability in the y-axis versus the cumulative distribution function of the false-alarm probability on the x-axis."

As the second paragraph in the entire article, it's really a bit much, a perfect example of the complaints about readability that have been expressed here numerous times over the years. Why are we being given detailed instructions on how to create and interpret a ROC curve, when we've barely been told what that even is or why we'd want to?

With the nightmare above excised, the entire lead then becomes this: A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was originally developed for operators of military radar receivers starting in 1941, which led to its name.

ROC analysis provides tools to select possibly optimal models and to discard suboptimal ones independently from (and prior to specifying) the cost context or the class distribution. ROC analysis is related in a direct and natural way to cost/benefit analysis of diagnostic decision making.

The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battlefields and was soon introduced to psychology to account for perceptual detection of stimuli. ROC analysis since then has been used in medicine, radiology, biometrics, forecasting of natural hazards, meteorology, model performance assessment, and other areas for many decades and is increasingly used in machine learning and data mining research.

The ROC is also known as a relative operating characteristic curve, because it is a comparison of two operating characteristics (TPR and FPR) as the criterion changes.

To summarize: If used correctly, ROC curves are a very powerful tool as a statistical performance measure in detection/classification theory and hypothesis testing, since they allow having all relevant quantities in one plot. Which, you know, is actually reasonable. It needs some minor cleanup to address the redundancy in the first and third paragraphs, now glaringly apparent without all that faff between them. And that "to summarize" simply has to go. (It's a lead section, duh! Its entire job is to summarize.) But, some rough edges notwithstanding, it's at least a passable attempt at introducing the concept of a ROC curve to the reader.

If we don't boot the Paragraph From Hell, then I'd like to at least trim it way, waaaay down. Like, literally a single sentence. Perhaps: "The ROC curve for a given decision model plots the rate of true positive results vs. false positive results at various threshold settings, providing a means of evaluating the model's performance." Oversimplified? Hell yes. But that's the point of a lead section, and the reason there's an entire article right below it that fills in the details. (As long as it's merely oversimplified, and not wrong. I don't claim to be a topic expert; heck I'm not even particularly familiar with the topic.) -- (please use&#32; on reply) FeRDNYC (talk) 16:42, 20 October 2021 (UTC)


 * There should be a list of Wikipedia red-flags (probably is, somewhere), and that should definitely be on it. If you have to summarize your lead section, something's not right up there. -- FeRDNYC (talk) 16:52, 20 October 2021 (UTC)

Proposed changes to transcluded formula template
Fellow Wikipedians: I've proposed some changes to the formula infobox transcluded into this article, with the goal of trimming down its overpowering (if not excessive) width. My original message with some explanatory notes is at Template talk:Confusion matrix terms, and you can see the revised template layout I've proposed by viewing its sandbox version.

There have been no responses over there in well over two months, and since the changes I'm proposing are significant enough to possibly be contentious, I wanted to invite any interested Wikipedians to discuss them over at the template's talk page. Thanks! FeRDNYC (talk) 00:12, 5 January 2022 (UTC)