Talk:Validity (statistics)

Why limited to psychology?
This is an artical putatively about validity the statistical concept. Why does the first sentence limit the discussion to the domain of psychology? If there are distinct validity concerns in the realm of psychology, shouldn't that be dealt with in a subsection? The overall article should presumably be as domain independent as possible. —Preceding unsigned comment added by 75.54.82.175 (talk) 21:00, 3 May 2009 (UTC)

Moved note re:Criterion validity
Anon 142.103.116.65 left the following note in the article space for Criterion validity, red-linked from this article. Since the note is more appropriate for a talk page, and that article doesn't exist yet, I'm moving the note here before deleting the article.
 * Criterion validity is not the right terminology. It ought to be "criterion related validity" which means the validity is actually of the predictor using the specific criterion. It is safer to term it "predictive validity" of a measure.

End copied text. SWAdair | Talk 07:14, 11 Mar 2005 (UTC)

As far as I know "A test can be reliable, but not valid" is the right definition or description. Imagine a clock shows every day 3.00pm, it is everyday the same, therefore it is reliable. However, if you try to measure your weight with a clock, you have a reliable measurement without validity.

Validity and reliability
The graphic illustrating the relationship between validity and reliability is incorrectly labeled. It is actually showing the relationship between precision and accuracy. — Preceding unsigned comment added by 68.47.191.151 (talk) 22:19, 30 May 2013 (UTC)

The article states that ''A valid measure must be reliable, but a reliable measure need not be valid. '', but Earl Babbie's 'The Practice of Social Research', 10th edition, p.145 has a graph that implies that a valid measure does not have to be reliable. Can anybody elaborate on this? --Piotr Konieczny aka Prokonsul Piotrus Talk 18:47, 16 October 2005 (UTC)


 * What is meant is that if measurements of a person's weight are to be valid (i.e. they actually measure weight) they must me reliable. They cannot change from instrument to instrument etc. However, an instrument can give consistent measurements - hence be reliabile - yet not measure what it is supposed to measure. Thus, it would be 'reliable' but not valid. This is a common argument. The problem with it is that the distinction between validity and reliability is blurry at best. It is taken for granted that validity means something measures the trait or attribute it purports to measure. Generally, people implicitly take it that something is only reliable if it measures what it purports to, and if so, the statement you cited ceases to make sense. It depends on how reliability is defined, in precise terms. Holon 02:00, 1 April 2006 (UTC)


 * I'm not sure I agree with you. Most textbooks and articles in both psychology (e.g. D. Borsboom et al., "The concept of validity". Psychological Review, 111,4,pp. 1061-71) and the social sciences (e.g. King, Keohane and Verba's well-known textbook) define validity and reliability clearly as separate concepts. Neither one necessarily implies the other. The classical explanation of this view is that of a rifle pointed at a target; a rifle aimed exactly at the bull's eye represents a valid measurement. But it may still be off because of random errors (imprecision). On the other hand, another rifle may be very precise (reliable) but pointed somewhere else completely, and thus invalid if you want to hit the bull's eye. A more statistical formulation is that unreliability is about random error, while invalidity is about systematic error. Your statement that "validity means something measures the trait or attribute it purports to measure" is indeed common, and can be expanded with "but not necessarily with perfect precision". 84.76.46.242 16:02, 4 November 2006 (UTC)


 * Appealing to "most textbooks" is not going to get you anywhere since most textbooks have yet to incorporate the 1999 AERA, APA, NCME Standards for Psychological and Educational Testing. Simply put, reliability means consistency. Cronbach's coefficient is actually a"coefficient of internal consistency". Inter-rater reliability actually concerns how consistent rating is from rater to rater. G-theory also concerns isolating sources of inconcsistency.


 * Validity is the degree to which evidence supports the interpretations of test scores required for specific purposes (AREA, APA, NCME, 1999). You cannot validly interpret test scores if the consistency of the results is unknown. In other words, if there is no evidence of reliability, you cannot know if the scores mean what you need them to mean. Did participant A really perform lower (or possess less of the target trait) than participant B? or was the difference only because rater A gives consistently lower scores than rater B? You could find out by analyzing the inter-rater reliability (consistency), but without that evidence, you cannot know that the scores reflect the target of measurement.


 * Per the 1999 Standards, reliability is most certainly a validity concern. It fits under the "Evidence Based on [the Test's] Internal Structure."


 * 74.45.132.147 (talk) 17:24, 2 March 2009 (UTC) (in an airport, don't want to login :)

Reliability and validity are related but independent. They are analogous to the engineering terms precision and accuracy respectively. An analog wristwatch that does not work is accurate (valid) twice a day to as many decimal places as you can measure. But it lacks precision (reliability). A watch than is always 10 minutes fast is never accurate but is very precise. These terms are well defined and accepted in engineering.

The problem comes in when mapping these concepts into social science because the terms acquire linguistic uncertainty from colloquial usage. In every day usage for example, a reliable person is always on time. Using the scientific definition of reliability, a person that is always 10 minutes late is also reliable.

Statistical validity requires statistical reliability, but not the the other way around. It could be argued conceptually that "face validity" shares the most in common with statistical reliability. Statistical validity is established by providing multiple examples of evidence over time, and through independent and interdependent research studies. An acceptable level of statistical reliability has to be estability in at least one study (even a pilot study) before evidence of statistical validity(s) can be established. In general examples of the different types of evidence for statistical validity(s) necessarily follow on from previous evidence (e.g. face then construct, concurrent, certain types of criterions and predictive validity(s)) There are also arguments and proofs for retrospective predictive validity (I.e. the capacity for a test to predict a diagnosis in the past) but these a more interesting in medicine than the social sciences. Tests of specificity and sesnsitivity are also reliant (at least in theory) on the amount and type of evidence established for statistical validity(s) Sensitivity and specificity study is more prevalent in medicine than psychology and the social sciences. Keeping this in mind might be helpful in determining the scope (scholarly disciple) of this article and what is better left to a different but related article e.g. Statistical validity in medicine, psychology, business, economics etc Dr.khatmando (talk) 05:42, 10 April 2017 (UTC)

Needs Rewrite for Clarity
I. Validity A. Internal B. External C. Statistical Conclusion D. Construct i. Intentional ii. Representation a. Face b. Content iii. Observation a. Predictive b. Criterion c. Concurrent d. Convergent
 * I would like to see this expanded using this outline as a guide:

Incorrect structure
The article confuses two main objects of validity, namely (1) a test, and (2) a (quasi)experiment. In case (1) validity is about the psychometric properties of the test, and for this case de APA Standards apply. In case (2) validity is about the validity of the causal inferences, and there the Cook & Campbell terminology applies. These are entirely different concepts of validity, and by mixing them into one list the accuracy of the article is compromised. Within the psychometric validity family the main concepts are Content, Criterion and Construct validity. Within the causal validity family, the main concepts are Statistical, Internal, Construct and External. What contributes to the confusion is that both families contain the concept Construct validity. However, these are actually two different concepts of construct validity. E.g. if you construct a test and uses this in an experiment, then an expert analysis of the test items contents contributed to the Content validity of the test, which in turn contributes to the Construct validity of the experiment. The expert analysis does not contribute to the Construct validity of the test, however. JulesEllis (talk) 14:50, 20 March 2008 (UTC)


 * I agree that this is in severe need of work. In fact, this is the first time I've checked the article on validity, and I'm quite embarrassed by its state given that Popham (2008) rightly observed that there is no more important issue in modern assessment.


 * JulesEllis: The "Holy Trinity" (Guion, 1980) of validity (Content, Criterion and Construct) is inconsistent with the APA/AERA/NCME 1999 standards, which inherited a lot of Messick's (1995) framework. I'm going to put some work in on this, but I'll leave this talk comment up to get some feedback before I start.


 * Validity is the degree to which evidence and theory support the interpretations required for specific uses of test results (AERA, APA, NCME, 1999). Nitko & Brookhart (2005) similarly define it as the "soundness of the interpretations and uses made of test results." The modern view is that validity is a unitary concept, but that the evidence used to support it may be categorized. For example, what used to be "criterion validity" is now simply "evidence of relation to other variables."


 * Also, validity is not a property of the test or the test results, but of the interpretations and uses made of the test results. Unfortunately, the shortcut phraseology persists (e.g. "this test is not valid"), but looking for such speech is an easy way to identify which version of validity theory (classical or modern) the speaker follows.


 * The distinction is more than semantic; it is extremely useful to practitioners. No longer is validity a checklist of necessary activities, but your specific uses and interpretations of the test result guide which types of validity evidence you will gather.


 * Over the next few days I'll post drafts here in the talk before I move them to the article. Please provide feedback.


 * Jmbrowne (talk) 00:28, 7 February 2009 (UTC)

So I commented almost a year ago and I haven't made any changes. I'm looking at this article wondering why it's even here. I'm no deletionist, but there are already articles on validity in research design and now there is one on test validity, plus articles on all the little test "validities". So why do we have this one here? Jmbrowne (talk) 02:53, 5 December 2009 (UTC)

Incremental Validity?
Why no reference to incremental validity? And why no page for incremental validity? Can someone come to the rescue?? --1000Faces (talk) 02:16, 11 December 2009 (UTC)


 * If provided a citation for the concept, then we could determine whether it belongs here. I don't see "incremental" listed in the AERA, APA, NCME standards, so I assume it's something from the "Statistical conclusion validity" section? The plurality of validity is tiresome, anachronistic, and useless. I say we trim what's here rather than add anything new. —Preceding unsigned comment added by Jmbrowne (talk • contribs) 14:46, 12 December 2009 (UTC)

First line vandalism
There's a little vandalism going on in the first line there...someone who knows what they're doing ought to take care of that... — Preceding unsigned comment added by 68.232.120.193 (talk) 00:57, 12 September 2011 (UTC)

Reference to logical validity in lede
The lede contains the sentences:

"The use of the term in logic is narrower, relating to the truth of inferences made from premises. In logic, and therefore as the term is applied to any epistemological claim, validity refers to the consistency of an argument flowing from the premises to the conclusion; as such, the truth of the claim in logic is not only reliant on validity. Rather, an argumentative claim is true if and only if it is both valid and sound."

This is more or less completely wrong. I propose to replace them with:

"The use of the term in logic is narrower, relating to the relationship between the premises and conclusion of an argument. In logic, validity refers to the property of an argument whereby if the premises are true then the truth of the conclusion follows by necessity. The conclusion of an argument is true if the argument is sound, which is to say that the argument is valid and its premises are true."Dezaxa (talk) 04:15, 30 May 2020 (UTC)