Talk:Sensitivity and specificity

merge
request merge of "negative predictive value" "positive predictive value" and "Sensitivity and specificity". these terms are intimiately related, and should be in one place, possibly with a discussion of ROC. Further, suggest modeling on > this is a great expostion of this complicated stuff.Cinnamon colbert (talk) 22:25, 16 November 2008 (UTC) PS: the three articles are a great start

I also think that the entries on precision and recall should be some how linked to this page. Recall is the same things as sensitivity, furthermore, specificity is merely your recall with regards to negative data-points. (Wyatt) —Preceding unsigned comment added by 156.56.93.239 (talk) 15:17, 5 November 2010 (UTC)

That is broken link and image cut off. Cannot see denominator. — Preceding unsigned comment added by Werowe (talk • contribs) 19:57, 25 April 2019 (UTC)

action-required: new-image
OK, I give up. Adding Images to Wiki is a nightmare.

I made a new image for this page that I think is more intuitive. http://s15.postimg.org/yoykdv34r/sensitivity_vs_specificity.png

Someone with more edits to their credit (most of mine have been made anon due to laziness) should add that image to the page. Cheers, -- Dave  — Preceding unsigned comment added by Ddopson (talk • contribs) 18:12, 6 September 2013 (UTC)

Unfortunately the new image made is no longer available at http://s15.postimg.org/yoykdv34r/sensitivity_vs_specificity.png. the main image is very unintuitive. i'll be looking into how to update it... Amitrao17 (talk) 16:58, 9 August 2020 (UTC)

Merger proposal
I don't believe I can be the only person who thinks it would be better to have a single page for sensitivity and specificity than separate pages for Sensitivity (tests) and Specificity (tests). At present Sensitivity and specificity is a redirect to Binary classification. One section of that, Binary classification is covering the same ground again. One possibility would be to locate the merged page here at Sensitivity and specificity, replacing the redirect. Binary classification could then have a "main article on this topic: ..." link to here too. Thoughts? --Qwfp (talk) 08:28, 28 February 2008 (UTC)


 * I've just realised (thanks to WhatamIdoing) that Test sensitivity should also be included in this discussion. (There's no corresponding test specificity article thank goodness.) --Qwfp (talk) 08:55, 28 February 2008 (UTC)

I agree with merging the two articles as the interpretation of one of them needs the other one also. —Preceding unsigned comment added by 92.249.193.252 (talk) 04:36, 11 March 2008 (UTC)

I also agree with merging the discussed articles, i think one should know things about both values. —Preceding unsigned comment added by 89.37.10.99 (talk) 15:28, 20 March 2008 (UTC)

I also agree with the proposal. Most of the times these two notions are taught, calculated and used as a pair.--[16:36, 21 March 2008 (UTC)

I concur. —Preceding unsigned comment added by 130.88.232.43 (talk) 17:54, 29 March 2008 (UTC)

When describing medical diagnostic tests, sensitivity and specificity always appear as a pair. I am all for merging the two articles. —Preceding unsigned comment added by 74.12.238.106 (talk) 01:04, 7 April 2008 (UTC)

I'm going to go one step further and say that it doesn't make sense to talk about either of sensitivity or specifity without the other. Why make it hard on the user by cross-referencing them instead of just putting them together? -- Jon Miller —Preceding comment added by Sighthndman (talk) 16:53, 2 May 2008 (UTC)

I absolutely agree with the previous comments. It does not make sense to talk about one without the other. —Preceding unsigned comment added by 198.153.57.100 (talk) 18:07, 9 May 2008 (UTC)

Make it so. —Preceding unsigned comment added by 206.117.152.201 (talk) 18:37, 23 May 2008 (UTC)

Yes I completely agree —Preceding unsigned comment added by 152.78.213.56 (talk) 15:03, 31 May 2008 (UTC)


 * Okay, merging done! Still needs a bit of work though. For the old talk pages, see Talk:Sensitivity (tests) and Talk:Specificity (tests). -3mta3 (talk) 11:55, 12 June 2008 (UTC)

There is an error in the text. A test that is high in sensitivity has a low type I error not tyoe II, and vice versa. It was recorded wrong, but i caught it after much thought. —Preceding unsigned comment added by 69.143.32.107 (talk) 15:26, 14 June 2010 (UTC)

Order
Common usage should play no role in article name, as normally-present concerns (e.g. unfindability) are compensated for by redirects. The same holds true in respect to layman's terminology (gingiva vs. gums). By reversing the order, the article invertes intuitiveness by presenting and elucidating that which relates to type 2 errors prior to what which relates to type 1 errors.  DRosenbach  ( Talk 12:40, 1 October 2009 (UTC)

The Naming conventions policy, specifically the section Naming conventions, says otherwise: "Articles are normally titled using the most common English-language name of a person or thing that is the subject of the article". 'Gingiva' is different as 'Gums' is ambiguous (see Gum) so the next section Naming conventions comes into play, but in the case of this article neither order is more or less ambiguous or precise than the other. Presenting specificity first because it relates to Type I error is more logical only to those who already know about Type I and Type II errors. To most non-statisticians, these terms are more confusing and less familiar than sensitivity and specificity. Qwfp (talk) 07:22, 3 October 2009 (UTC)

Specificity/Sensitivity in Bioinformatics
In bioinformatics literature, the terms 'specificity' and 'sensitivity' are used, but are different from what is shown in this article. The terms are used as synonyms for precision and recall. In other words, sensitivity is used correctly, but specificity seems to be used for positive predictive value. I haven't been able to find a source explicitly exposing this difference, though, so I haven't edited the article to include this caveat. —Preceding unsigned comment added by Kernco (talk • contribs) 17:53, 9 April 2010 (UTC)

In fact,the terms 'specificity' and 'sensitivity' are same when in bioinformatics literatures.Maybe some authors used the wrong formula definition in one bioinformatics literature.Someone point out the error in a comment.


 * I suspect that in being virtual, informatics has issues with what is the real and correct (ground) truth when operationally defining what are "actual positives" and "actual negatives". Perhap a section on informatics definitions would be beneficial in this article. Zulu Papa 5 * (talk) 02:11, 6 November 2010 (UTC)

In the year since I posted this, and continued my phd research in bioinformatics, it's definitely apparent to me that there's no consensus in the community on what the formula for specificity should be, though my impression is still that the most common usage of specificity in the literature is as a synonym for positive predictive value. It's probably just the case of a wrong usage in the past propagating forward, since you must use the same evaluation metrics to compare your own results with previous ones, so I'm not pushing for this page to be changed in any way. I think it's an interesting anomaly, though. Kernco (talk) 20:34, 19 April 2011 (UTC)

Suggested edits to be made under 'Specificity'
Suggested edits to be made under 'Specificity':

Hi, I'm new to editing here and don't know how to properly do it. However, I hope that somebody who cares, can see if the following edits help. Because I'm unfamiliar with 'code' used by wikipaedia, please accept my improvised 'formatting' which I've devised as follows. Rgds email kingnept(at) singnet.com.sg
 * What is italics are the original text which should be deleted.
 * What is in bold is what I think should be added.

Specificity: "...A specificity of 100% means that the test recognizes all actual negatives - for example in a test for a certain disease, all healthy disease free people will be recognized as healthy disease free. Because 100% specificity means no positives disease free persons are erroneously tagged as diseased. , a positive result in a  A high specificity test is normally used to confirm the disease. The maximum can trivially be achieved by a test that claims everybody healthy regardless of the true condition. Unfortunately a 100%-specific test standard can also be ascribed to a 'bogus' test kit whereby nobody, not even those who are truly diseased, ever get tagged as diseased. Therefore, the specificity alone does not tell us how well the test recognizes positive cases. We also need to know the sensitivity of the test. A test with a high specificity has a low type I error rate." —Preceding unsigned comment added by 220.255.64.42 (talk) 21:49, 9 July 2010 (UTC)

=
================== Postscript: IMHO: 'sensitivity' and 'specificity' are both VERY BAD 'misnomers' that serve confusion rather then clarity. The more accurate description of each would be 'True-Positive-Rate', 'True-Neg-Rate' respectively; alas, the terms 'sensitivity' and 'specificity' seem poor colloquials that have long served to confuse and mislead. Too bad that they now seem to be convention, but perhaps someone with clout and commitment should no less clarify this ambiguity. 119.74.145.154 (talk) 00:48, 10 July 2010 (UTC)Rgds, Kingnept119.74.145.154 (talk) 00:48, 10 July 2010 (UTC)


 * Coming to this article by searching for False positive I too wonder if there is scope for describing the simplest concepts in terms that will make obvious sense to an interested layman? False positive, false negative, are both important and if grasped do allow the layperson to understand quite a lot of the more important issues in real life. I would hope to see them in the lede. Richard Keatinge (talk) 18:59, 6 August 2012 (UTC)

Simple summary or interpretation
I tried to summarise with

"==Medical example== Eg. a medical diagnostic criteria quoted as having sensitivity = 43% and specificity = 96% means that 43% of the people with the criteria have the disease, and 96% of the people without the criteria do not have the disease. Hence 'sensitive' tests help confirm a diagnosis whilst 'specific' tests help exclude a diagnosis."

but it seems to conflict with the Worked Example. Can anyone confirm or correct my summary above please. Does it mean 43% of the people with the disease have the criteria and 96% of the people without the disease do not have the criteria. ? If that's the case wouldn't we more usefully characterise tests by PPV and NPV rather than sensitivity and specificity ? Rod57 (talk) 13:33, 17 August 2010 (UTC)


 * I believe the point is in the context of how the test is to be applied. Sensitivity and specificity are the standard method to achieving Receiver operating characteristics which are great to optimize diagnostic performance, but say little about how they are applied to benefit a decision. The predictive values are focused on the test outcome while the Sens. and Spec. are a description of the quality of the test as to its performance as a standard. The example given means to illustrate that a test may be good at "confirming" or "excluding" a diagnosis. In this example with 96% specificity, means the test is better at excluding.  How the diagnosis is framed to benefit, sets the next question with this given test. Zulu Papa 5 * (talk) 02:23, 6 November 2010 (UTC)

[First-time user] The above summary confuses Sensitivity/Specificity with Positive Predictive Value (PPV)/Negative Predictive Value (NPV). Correction of the above summary: A medical test is 34% sensitive; if we are given a person with the condition, then the test has a 34% chance of being positive. A medical test is 69% specific; if we are given a person without the condition, then the test has a 69% chance of being negative.

A test has a 43% PPV; if we are given a positive test, then there is a 43% chance that the person has the condition. If we are given 100 positive tests (one per person), it is likely that 43% of the people have the condition. A test has a 96% NPV; if we are given a negative test, then there is a 96% chance that the person does not have the condition. If we are given 100 negative tests, it is likely that 96% of the people do not have the condition.

The original page also confuses Specificity with PPV, with the following: "If a test has high specificity, a positive result from the test means a high probability of the presence of disease.[1]" It should read, "For a test with a high specificity, if we are given a person without the condition then there is a high chance that the test is negative." The original page must be corrected, by an experienced editor. No sources to cite. 122.57.152.49 (talk) 09:25, 2 November 2011 (UTC)

Sufficient sample size for sensitivity?
Responding to this comment here:
 * Hence with large numbers of false positives and few false negatives, a positive FOB screen test is in itself poor at confirming cancer (PPV = 10%) and further investigations must be undertaken, it will, however, pick up 66.7% of all cancers (the sensitivity).

In: the worked example.

Only three people were tested so if the test were done on, let's say a 100 people, with bowel cancer then maybe there would be a different proportion then 66.7%.

So is it correct to say "the sensitivity of the test is 66.7%"? Wouldn't we need to test it on more people who have bowel cancer?

Although perhaps we could have said something like "the sample sensitivity is 66.7%" as contrasted with the theoretical sensitivity.

At least Wolfram's MathWorld calls "sensitivity" the probability that a positive value tests positive -- so we may not have enough samples to get an estimate of the probability.

MathWorld entry for sensitivity  Jjjjjjjjjj (talk) 08:24, 8 March 2011 (UTC)


 * Although there were only three people who tested positive, the test was done on all two hundred and three people. More number would give you a better estimate of the probability - I think the most you can say is that although the sensitivty appears to be 66.7%, the confidence limits on that figure would necessarily be very wide. A larger sample may or may not show a changed figure, but the more cases the narrower the confidence interval should be. I guess when talking about a test the numbers come from samples of the whole population and are always an estimate. Egmason (talk) 23:47, 30 March 2011 (UTC)


 * I removed the clarify tag -- it looks like somebody changed the numbers around so that it is now thirty people who have bowel cancer rather than only three people. I just changed the wording around a little bit consistent with the idea that one study does not necessarily determine the performance of the particular test. There may be further uncertainty about the performance.


 * Jjjjjjjjjj (talk) 05:20, 10 May 2011 (UTC)

Confidence intervals
Would it be useful to have something about confidence intervals (or credible intervals for Bayesians) here? You would presumably calculate the Binomial proportion confidence interval, or would find the posterior distribution given an uninformative Beta(1,1) prior. In the given example you would compute a 95% confidence interval for the sensitivity of (0.5060, 0.8271) and a 95% credible interval of (0.4863, 0.8077). You would compute a 95% confidence interval for the specificity of (0.8971, 0.9222) and 95% credible interval of (0.8967, 0.9218). These calculations assume that prevalence is exactly that in the data. Ts4079 (talk) 14:42, 1 November 2011 (UTC)
 * Such confidence intervals could be easily misinterpreted since sensitivity and specificity are often closely related, as demonstrated in the ROC curve. Something more advanced is probably necessary. Ts4079 (talk) 15:14, 1 November 2011 (UTC)

Maybe if it had sources? Zulu Papa 5 * (talk) 02:10, 2 November 2011 (UTC)

Denominators of definitions
The section Sensitivity provides the following definition:


 * $$\begin{align}

\text{sensitivity} & = \frac{\text{number of true positives}}{\text{number of true positives} + \text{number of false negatives}} = \frac{\text{number of true positives}}{\text{number of positives}} \\ \\ & = \text{probability of a positive test, given that the patient is ill} \end{align}$$

I do not understand how the denominator "number of positives" is related to the denominator that precedes it: "number of true positives + number of false negatives". I thought the number of positives would instead equal the number of true positives + number of false positives. Thus, I believe this to be a typo and that "number of positives" should be replaced with something like "number of ill people", which is number of true positives + number of false negatives.

As the page currently stands, it appears that the first and last expressions in this three part equation correspond to "sensitivity" as defined here. The middle expression, which I am questioning, instead appears to correspond to the Positive predictive value as defined on that page. (The above talk section on Sensitivity in Bioinformatics suggests that sometimes the word "sensitivity" is used for positive predictive value. While that might be true, I think we should not switch between the two definitions mid equation.)

I have a similar concern about the definition of "specificity" in the section Specificity. There I believe that the denominator "number of negatives" should be something like "number of well people".

Mikitikiwiki (talk) 00:27, 18 June 2013 (UTC)


 * I think this is a valid point; both the sensitivity and specificity equations use "positive" to mean both a positive test result (which is either true or false relative to the population ground truth) or a positive (i.e., actual) occurrence of illness in a member of the population without sufficiently distinguishing between the two meanings. I agree that changing "number of positives" to something like "number of ill people" or "number of actual positives" (and making an analogous change for the specificity equation) would clarify matters. 142.20.133.199 (talk) 19:12, 18 June 2013 (UTC)

Over three years later, the problem is still there. The equations provided in this box are not consistent. I agree with the two comments above. Similar contradictions appear at the very beginning of the page, between the definitions of sensitivity and specificity and the corresponding examples. 2.224.242.112 (talk) 17:36, 28 December 2016 (UTC)

How "highly" sensitive or specific does a test have to be before "SNOUT" and "SPIN" apply?
The article states (under the "Sensitivity" and "Specificity" sections respectively) that "negative results in a high sensitivity test are used to rule out the disease" (referred to by the mnemonic "SNOUT" later in the article) and that "a positive result from a test with high specificity means a high probability of the presence of disease" (described by the mnemonic "SPIN"). However, the example calculation (for a test with a specificity of 91% and a sensitivity of 67%) demonstrates a case in which a positive result from a high specificity test (SPIN) clearly does not correspond to a high probability of the presence of disease (PPV is 10% in the example). Although this depends on prevalence, it seems to indicate that the numerous SNOUT/SPIN-type statements throughout the article are inaccurate as written. (Another such statement is in the "Medical Examples" section, which states that "[a] highly specific test is unlikely to give a false positive result: a positive result should thus be regarded as a true positive".)

Accordingly, I think these statements should be modified to give some idea of exactly how high sensitivity/specificity have to be before SNOUT/SPIN apply (as noted above, the worked example uses a specificity of 91%, which is fairly high, yet a positive test result clearly does not correspond to a high probability of the disease being present) and note the effect of disease prevalence on these assertions or be removed entirely. 142.20.133.199 (talk) —Preceding undated comment added 19:40, 18 June 2013 (UTC)

Type I/II errors, and FP and FN's
It seems to me that the big table near the top of the page has Type I and Type II errors reversed. According to elsewhere (both in my stats book and to the explicit links to Type I and Type II errors in the table itself), a Type I error is "test rejects when hypothesis is actually true", in other words it's a false negative; and a Type II error is "accept when actually false", ie., a false positive. This is exactly backwards from where the Type I and Type II are located. I think the table should be corrected, but I'll leave it to the experts.

-- Wayne Hayes Associate Professor of Computer Science, UC Irvine.

PS: This is exactly why I *HATE* the terms "type I" and "type II". Let's just call them false positives and false negatives, for Pete's sake!! (Yeah, I'm trying to change decades of nomenclature here... :-)


 * You are wrong and the article is right. In the phrase "test rejects when hypothesis is actually true", the hypothesis refers to the null hypothesis, i.e. that there is nothing special going on. Rejecting the null hypothesis means triggering an alarm, i.e. saying that something special is going on. Therefore this is the false positive (false alarm) case.
 * These are the quirks of statistical hypothesis testing. They all have things kind of "backwards" as they use frequentist concepts as opposed to Bayesian ones. Qorilla (talk) 16:13, 6 May 2015 (UTC)

complement or dual? TruePos vs FalseNeg
The primary def'ns currently say "Sensitivity (also called the true positive rate) is complementary to the false negative rate." -- I take complement to mean "sums to completion (i.e., 1)" which is "not" in probability. I'm not sure if True Positive is a dual to False Negative, but that sounds much more plausible. not-just-yeti (talk) 05:32, 14 July 2015 (UTC)

Update: hearing no dissent, I've removed those comments. Presumably 'dual' is meant, and if so that's a detail that's not appropriate for the opening, defining sentences. not-just-yeti (talk) 18:35, 19 July 2015 (UTC)


 * I just added to the lead a comment tying in FPs and FNs, but without reference to either complementarity or duality.—PaulTanenbaum (talk) 21:21, 1 September 2015 (UTC)


 * Interesting, maybe best to look for sources about ROC Exclusive_or, possibility a question frame and/or relative/ultimate truth issue to clarify the classifier. Sounds like a subject object issue between receiver/operator nor operator/receiver. N={0,1} Zulu Papa 5 * (talk) 21:48, 1 September 2015 (UTC)

Circularity about selectivity
I came to Wikipedia to learn about a notion of selectivity that seems to be related to sensitivity and specificity. Sadly, nothing I've found in Wikipedia on the matter is at all helpful. The disambiguation page points to articles on binding selectivity and functional selectivity, but neither of them clearly defines the (presumably more fundamental) term selectivity. The former article does point to the article on the reactivity-selectivity principle, but it too is fairly obtuse on exactly what selectivity itself is.

My last hope from the disambiguation page was the link to this article on sensitivity and specificity. But, wouldn't you know it, the only occurrence here of the term selectivity is under the "See also" and links to (you guessed it) the disambiguation page!

Have the evil gremlins carefully crafted this self-referential knot of definitional vacuity?

I'd cut the Gordian knot if I knew the topic well enough to do so, but recall that what led me to discover the knot in the first place was my attempt to learn something about selectivity.

Would somebody please mend this tiny corner of Wikipedia.

—PaulTanenbaum (talk) 21:02, 1 September 2015 (UTC)


 * Is this relevant Feature selection? Typically wiki just wants to educated folks in practicing to edit. Articles always require improvement with sources, so look at sources (existing and new) to find your real education, then contribute.  Zulu Papa 5 * (talk) 21:56, 1 September 2015 (UTC)

Re. Worked Example
"However as a screening test, a negative result is very good at reassuring that a patient does not have the disorder (NPV = 99.5%)"

There are only 30 out of 2030 who have the disease. If I use a test whereby I tell everyone that they are negative and do not have the disease, I will be correct 98.5% of the time. How then is "a negative result ... very good at reassuring that a patient does not have the disorder (NPV=99.5%)", when I can do just about as well by just telling everyone they are negative? — Preceding unsigned comment added by 86.182.185.113 (talk) 01:02, 10 December 2015 (UTC)
 * That's the point of the test. In reality you don't know whether people who don't have the disease actually don't, and that's the point of the test. Maybe you're confusing PPV with specificity? In other words, having a negative result is "trustworthy", but this tells you nothing on how the test performs with e.g. diseased people, or what you say to the patient if the result is positive. --Cpt ricard (talk) 07:48, 23 October 2016 (UTC)

Order of "True Condition" and "Predicted Condition" On Confusion Matrix
I would suggest reversing the order of the "True Condition" and "Predicted Condition" on the confusion matrix diagram. While there is certainly no standard or formal way to create a confusion matrix, in my experience, significantly more of the time in such a matrix, we see the "Predicted Condition" on the left and the "True Condition" on the top. See, for instance, James Jekel, Epidemiology, Biostatistics, and Preventative Medicine, pp. 108-109. In the current diagram, there is the reverse, with the "Predicted Condition" on the top and the "True Condition" on the left. In my opinion, the current arrangement is the minority view, and may confuse some readers. Edit: Upon further research, there seems to be differing conventions between the computer science/machine learning ordering of confusion matrix, and the orderings coming out of the biological, statistics, and medicine. Computer science seems to put the predicted class on top, whereas the others tend to put the true state on top. — Preceding unsigned comment added by 128.138.65.241 (talk) 20:19, 4 November 2016 (UTC)


 * This is an important theme, because for not expert it creates "confusion on confusion matrix". Operationally, there is a need of specification that the structure of the matrix could be transposed and the readers should be aware of the order of the matrix. Other wikis use the alternative representation and there are no alignment, but more importantly in R, some package aimed at calculating various diagnostic criteria use True-Predictor and other Predictor-True. Alessio Facchin (talk) 09:41, 13 June 2022 (UTC)

Unclear but obviously wrong information (Ethan Trautman in 69420)
The article contains the following claim:

"The terms "sensitivity" and "specificity" were introduced by American biostatistician Ethan Trautman in 69420.[1]"

I can't find Ethan Trautman in any online search, which seems surprising if he introduced such important concepts. But assuming he existed, the date (I presume) 69420 must be wrong. Maybe it should be 1942?

I can't find the correct information myself; maybe there is someone out there who knows? — Preceding unsigned comment added by Zosterops (talk • contribs) 04:53, 24 September 2020 (UTC)
 * The article was vandalised about four hours ago. I have restored the original text. PeepleLikeYou (talk) 06:18, 24 September 2020 (UTC)

How about adding a section with examples of the two numbers for several real-world tests?
How about adding a section with examples of the two numbers for several real-world tests?

I'd like to have some idea of how much real-world false positive and negative rates vary with medical tests in actual use. — Preceding unsigned comment added by Editeur24 (talk • contribs) 15:06, 3 October 2020 (UTC)
 * There are already several examples in the article. PeepleLikeYou (talk) 11:06, 5 October 2020 (UTC)

"This article may be too technical for most readers to understand." really?
I'm amazed at that comment. The math is no more than grade school level, or maybe high school--- equations, fractions, and square root signs. How could it possibly be dealt with any more simply? Or maybe it's already been simplified since the July label posting. editeur24 (talk) 15:09, 3 October 2020 (UTC)
 * The date on the tag was updated by User:Aldaron in July. It looks from that edit that it had been there for three years prior to that. PeepleLikeYou (talk) 22:26, 3 October 2020 (UTC)
 * I removed the tag as there has been no specific objection raised by Aldaron or anyone else in talk. Aldaron included an edit summary, but it's not enough to state that it "needs a better introduction" and "some guidance about the (unfortunate) unnecessary proliferation of terminology". Editors need to know what terminology requires guidance, and what is unnecessary. WP Ludicer (talk) 09:04, 13 May 2022 (UTC)

Should we drop these two paragraphs in the introduction? Is the last sentence of each one correct?

 * In a diagnostic test, sensitivity is a measure of how well a test can identify true positives. Sensitivity can also be referred to as the recall, hit rate, or true positive rate. It is the percentage, or proportion, of true positives out of all the samples that have the condition (true positives and false negatives). The sensitivity of a test can help to show how well it can classify samples that have the condition. A high sensitivity value means a test correctly classifies a sample without the condition as negative more than another test that has a lower sensitivity.


 * In a diagnostic test, specificity is a measure of how well a test can identify true negatives. Specificity is also referred to as selectivity or true negative rate, and it is the percentage, or proportion, of the true negatives out of all the samples that do not have the condition (true negatives and false positives). A test with a high specificity value means that it is correctly classifying samples with the condition more often than a test with a lower specificity.

Do these two paragraphs add enough understanding to be worth the space?

Also, is the last sentence of each one correct? A test with sensitivity of 98% but specificity of 10% would do worse classifying a sample as being mostly negatives than a test with sensitivity of 90% but specificity of 90%, wouldn't it? --editeur24 (talk) 04:25, 11 December 2020 (UTC)

Fundamental Theorem of Screening
I have removed mentions to "Fundamental Theorem of Screening" and related. This is based on a 2020 paper previously on arXiv (so a pre-print) and now on PlosOne, which Google registers only 5 citations. It has also previously been pointed as possible WP:Promotion by User:PeepleLikeYou. All mentions to this have been made by users User_talk:Candlelightship and User_talk:Epigeek84, through 2020 and 2021. I am creating this talk section to raise awareness that this page is being subject of possible vandalism.

This is not a judgement on the correctness or a direct assessment on how important or good this material is or could be, but a simple enforcement of Wikipedia's policies. After all, Wikipedia is not a place for original research, and it relies as much as possible on secondary sources (WP:PSTS), while an original paper is a primary source only. If the content is important enough so that it is discussed in secondary sources (e.g. survey papers, books), then it can be appropriately included in Wikipedia.

a mistake
Hey, I am a new editor but also an expert in the subject matter and do machine learning for a living.

In the screening example it says "...If it turns out that the sensitivity is high then any person the test classifies as positive is likely to be a true positive. "

This is not at all true. If sensitivity is high, then any person that is positive is likely to be classified as positive, not vice versa. The same mistake is made about specificity. I went ahead and fixed the mistake; feel free to clean up if I did it incorrectly. 71.240.197.67 (talk) 02:50, 21 July 2021 (UTC)
 * The text after your edit does not make any sense, and the thing you seem to be trying to make it say is what it already said before your edit. PeepleLikeYou (talk) 02:39, 25 July 2021 (UTC)

Proposal to re-order the first few sections
As much as I like the confusion matrix for defining all the related terms, the fact that it shows up before the actual definitions of sensitivity and specificity in this article feels a little overwhelming. Perhaps an approach like that in the articles on Precision and recall or Positive and negative predictive values would make more sense here, with a definition section that comes first followed by examples, moving the confusion matrix down to right above the worked example (again, similar to the article on PPV and NPV): This would feel more natural to me, but I thought I'd ask for other opinions before implementing these changes. Rundquist (talk) 00:01, 29 July 2021 (UTC)
 * 1) Definition
 * 2) Sensitivity
 * 3) Specificity
 * 4) Graphical illustration
 * 5) Medical examples
 * 6) Misconceptions
 * 7) Sensitivity index
 * 8) Confusion matrix
 * 9) Worked example
 * Thank you for your work on this article. I think it lightens the mental load on the reader to have an example after each definition, rather than separating the definitions and examples. PeepleLikeYou (talk) 23:05, 29 July 2021 (UTC)
 * That's fine, I'm not actually talking about changing any of the text of the sections (so the examples that are given in the "Sensitivity" and "Specificity" sections could stay the same). I'm just proposing to put make both of those as sub-sections of a larger "Definition" section, and also to move the "Confusion matrix" section later in the article so it doesn't get in the way of the definitions. I'll go ahead and make these changes so you can see what I mean, and if you don't like it feel free to revert. Rundquist (talk) 18:20, 30 July 2021 (UTC)

Redundant references
I noticed that the same "Powers (2011)" citation appears three separate times in the reference list (#4, 20, 25 currently). The first and last of these are coming from transcluded templates "Confusion matrix terms" and "diagnostic testing diagram". The other comes from a named reference in the article text. All citations refer to the same publication (author, year, title, journal, volume, pages, and URL). Initially, the in-article citation had a different external link than the citations coming from the templates (HDL vs URL). I changed the in-article citation to use the same URL, but they still were not merged in the rendered article.

It would be nice if the in-article and transcluded citations could be merged into a single citation when all the citation metadata matches. This appears to be a known limitation of the code that builds the reference list. I hunted around in the known Cite-related bugs and found this: "References that are identical in everything but the name are not merged". The in-article citation most likely has a different ref name than the transcluded citations, which may also differ from each other in their names, so that could explain the lack of merging seen here. Another complicating factor is the different sources of the citations (article vs templates) which could be another hurdle. What do you think User:Sdkb? SteveChervitzTrutane (talk) 07:16, 18 December 2021 (UTC)
 * Yeah, this is definitely an issue. I hope that phab ticket will eventually be addressed; feel free to comment on it to see if that draws any attention. &#123;{u&#124; Sdkb  }&#125;  talk 07:22, 18 December 2021 (UTC)

Proposed changes to transcluded formula template
Fellow Wikipedians: I've proposed some changes to the formula infobox transcluded into this article, with the goal of trimming down its overpowering (if not excessive) width. My original message with some explanatory notes is at Template talk:Confusion matrix terms, and you can see the revised template layout I've proposed by viewing its sandbox version.

There have been no responses over there in well over two months, and since the changes I'm proposing are significant enough to possibly be contentious, I wanted to invite any interested Wikipedians to discuss them over at the template's talk page. Thanks! FeRDNYC (talk) 00:08, 5 January 2022 (UTC)

Incorrect description of Specificity in Image
"Specificity - How many negative selected elements are truly negative?" - that is a description of Negative Predictive Power.

A more accurate substitute would be, "How many negative elements were correctly identified [as negative]?" Bitkey (talk) 13:05, 8 September 2022 (UTC)


 * You're right that it's not specificity, but I don't think it's even NPV. That would read something like "What proportion of the negative elements are truly negative?"
 * Worth mentioning that he whole image could do with reworking. Having the circle represent all individuals who tested positive makes for less consistent "graphical math" than having the circle just represent truth. I suspect this is what Talk:Sensitivity and specificity new-image is getting at.
 * Spirarel (talk) 21:46, 22 September 2022 (UTC)
 * Is that not the same thing? Given that we take the definition in the image as, "How many [as a proportion] negative selected elements are truly negative?", don't both definitions just describe the ratio of TN/(TN+FN)?
 * Or do you literally mean the ratio of true negatives to all negatives - TN/(TN+FP)? That is something else again. Bitkey (talk) 23:52, 22 September 2022 (UTC)
 * I think you're getting sidetracked. We both agree that the image text is not as good as it could be; I support a rework. It looks like it's a commons image, it can probably just be changed.
 * Spirarel (talk) 16:12, 24 September 2022 (UTC)

Just trying to help mate. You disagreed with my definition of NPV then gave back a slightly more vague version of the same definition. I think we would both agree that using "what proportion" instead of "how many" in the rework would be more accurate for all definitions. I'll try and get around to it this week. Bitkey (talk) 11:21, 25 September 2022 (UTC)

Where is the matrics calculation panel gone?
I remember there used to be a right-side panel demonstrating all performance matrics, such as accuracy, specificity, sensitivity, MCC, etc using the confusion matrix. It has been very convenient for me to check my calculations. Where has it gone? Lichen7788250 (talk) 03:46, 25 March 2024 (UTC)

Do not use "exclude" but "detect" since it's less loaded and more precise here.
ORIGINAL: A test which reliably excludes individuals who do not have the condition, resulting in a high number of true negatives and low number of false positives, will have a high specificity. (...)

ALTERNATIVE 1: A test which reliably DETECTS individuals who do not have the condition, resulting in a high number of true negatives and low number of false positives, will have a high specificity. (...)

ALTERNATIVE 2: A test which reliably DETECTS individuals the ABSENCE OF A condition, resulting in a high number of true negatives and low number of false positives, will have a high specificity. (...)

The ALTERNATIVE 2 follows the same style than the paragraph above the suggested change. The paragraph above goes:

"A test which reliably detects the presence of a condition..."

So my suggestion uses DETECTS in place of EXCLUDES and uses ABSENCE in place of PRESENCE as well...