Talk:Item response theory

Negative language
Angela feels that the second to last paragraph, where tests are described as imprecise and containing error, is too negative. I feel it is a statement of fact that is often misunderstood by non-psychometricians. I think it follows directly from the psychometric material here on Wikipedia, particularly classical test theory.

It's worth noting that I, the author, am a psychometrician (i.e., not likely to have a negative view of testing). Maybe someone can suggest a alternative wording that appears balanced?

Amead 19:09, 5 Jan 2004 (UTC)


 * Actually, I'm happy with the way it is now as it makes it clear you are talking about it in terms of standard error etc rather than sounding like someone's opinion. Angela.

Amead, I reverted the article, at least for now, becuase I don't believe that on balance, the changes enhanced it. Revert again if you wish -- prefer we work toward a better article in a considered way if possible. The issues were:
 * The definition introduced did not properly define IRT, rather it described (i) what it is used for and (ii) what it is not. While I agree a definition should be as non-technical as possible, this can only be so with reason (see for e.g. probability theory) and efforts to make it so should not detract from the essential elements of a definition. Having said this, some of the changes were great and I agree the definition needs to be improved.
 * IRT has been referred to as a body of related psychometric theory from early on, and I don't see good reason to say "it is not a theory per se". Doesn't this suggest the label is self-contradictory and so confused? I would also note that saying it is a body of theory does not suggest it is a theory (i.e. particular theory).
 * Extra spaces between 1st and 2nd para of overview look sloppy (minor point obviously)
 * Reliability vs information being introduced as a topic separately to information is to me inefficient (e.g. detail such as info fn being bell-shaped was repeated). Further, if we want to make this connection, it should be done properly.  See below (**)
 * The Rasch had already been covered, and there was no connection to newly introduced material. Also, the One Parameter Logistic Model (OPLM) is also a model referred to by Verhelst & Glas (1995) which potentially makes the statement quite confusing. The comments on Rasch were (as you stated) from an American perspective -- European, Asian, etc. perspectives also need to be considered.

BTW, you're right about link to discrimination, removed it.

(**)On reliability, let: $$\hat{\theta} = \theta + \epsilon$$

Then $$\mbox{SE}({\theta})$$ is an estimate of the standard deviation of $$\epsilon$$ for person with a given wighted score and

$$ \frac{\mbox{VAR}[\theta]}{\mbox{VAR}[\hat{\theta}]}=\frac{\mbox{VAR}[\hat{\theta}]-\mbox{VAR}[\epsilon]}{\mbox{VAR}[\hat{\theta}]} $$

is analogous to chronbach's alpha (indeed it is typically very close in value) and so analogous to the traditional concept of reliability. The mean squared standard error can be used as the estimate of the variance of the error across persons. Take care ... Stephenhumphry 03:22, 30 July 2005 (UTC)

I removed two external links to sites related to the Rasch model. Both links were "Objective Measurement" links, not general IRT links, and thus more appropriately belong on the Rasch model wiki (which I added to the "See also" section). Bhabing 20:27, 15 April 2006 (UTC)

Amount of distinguishing from Rasch Measurement
Do we really need two paragraphs distinguishing IRT from Rasch? I also prefer to have a more balanced set of references on the issue of distinguishing the two, and that the definition of measurement not be from a strictly Rasch perspective.

--129.252.16.200 21:00, 25 September 2006 (UTC)

I think a couple of paragraphs about IRT and Rasch is in order. The definition of measurement is not from a Rasch perspective. The definition of measurement throughout the natural sciences is quite clear. See psychometrics for a brief account of the history of this definition. If you would like to propose a definition you think is widely accepted in IRT with a citation, be my guest. Please do not attempt to 'balance' by omitting a perspective. Balance on Wikipedia should be achieved by considered presentation of alternative perspectives. There were some quite fundamental problems with previous edits. For example, the reference to "easily computed sufficient statistics" seemed to imply other models have sufficient statistics but they're just not easily computed. This was misleading to say the least. Holon 00:58, 26 September 2006 (UTC)

In that case I would suggest starting a separate, later section of the IRT wiki dealing with the relationship between "Model building" based IRT and the philosophical underpinnings of Rasch measurement, instead of putting it in what is ostensibly the "Overview" section for IRT. It would make the references to discrimination and 2PL/3PL model make more sense. I think it would also be a better place for the "frame of reference" discussion.


 * I agree it is better placed in another section. Let's do that. I'm pretty flat out -- if you want to have a go, great, and I'll look at it when I can. Holon 03:31, 26 September 2006 (UTC)


 * I should have time to do a little mucking around second week of October. ::crosses-fingers::: --Bhabing 23:37, 26 September 2006 (UTC)


 * Great, well I'll have a go if I get time also. Together, I'm sure we can improve. I think quite a few parts of the article could be improved personally. Holon 01:20, 27 September 2006 (UTC)


 * Cool. I'm trying to encourage a swath of IRT people I've worked with into contributing on their areas of specialization (MIRT, DIF, equating, unfolding models, etc...). --Bhabing 03:39, 27 September 2006 (UTC)

As far as the definition of "measurement", it strikes me as patently untrue that it has a single agreed upon definition in psychometrics (regardless of what other wiki’s might say). The Thissen reference that you have removed twice deals with this from the IRT model building perspective. In addition to stating his (a past president of the psychometric society) own opinion, he also provides several references. That it is not decisively accepted by IRT practitioners at large is also attested by many of its staunchest proponents use of "objective", "Rasch" and "fundamental" as modifiers, and by the plethora of articles defending it (why defend what no one attacks?). If the giants in the field don't agree then it seems odd for the wiki to choose one side (as it does by use of the Andrich quote, as opposed to the Wright quote which has a modifier).


 * There is a miscommunication here. Let me be as clear as possible. My whole point in inviting you to give an agreed upon definition of measurement in IRT is that it doesn't exist. Rasch explicitly showed the congruence of his models with measurement in physics in his 1960 book. What I actually said is that the definition of measurement in the natural sciences - physics, chemistry, etc. - is widely agreed. Indeed, it is implied by the definition of all SI units and the standard means of expressing magnitudes in physics (a number of units where the numbeer is a real). See Reese's quote in the psychometric article. Your edits seemed to me to suggest that Rasch and proponents have 'created' some mysterious definition of measurement, which is patently untrue. What has actually occurred is that various people have created definitions of measurement that are incongruent with the definition throughout the most established sciences (physics, etc.). I have no problem with you presenting alternative definitions, but be clear about them so the article can be written from that basis. There is no need to labour the definition of measurement implied by Rasch models, because of the congruence with the rest of science. Holon 03:31, 26 September 2006 (UTC)


 * To add to the above, I'm perplexed by your comments about articles "defending it". Defending what, exactly? By whom? Holon 05:38, 26 September 2006 (UTC)

As far as sufficient statistics, isn't the entire data set definitionally a sufficient statistic (albeit a not-very-useful one) for the model parameters in general, making the statement “has sufficient statistics” vacuous? (I would be interested in any references to mathematical statistics texts that restrict sufficient statistics from being the entire data set.)


 * Person and item parameters have sufficient statistics (DATA only) in the Rasch model. There is no data reduction when the entire data set is called a statistic, and I would suggest you'd need to define the term statistic. So the answer is no, it is not at all vacuous to state that person and item parameters have sufficient statistics (total scores). Holon 03:31, 26 September 2006 (UTC)


 * That the statistic (X1, X2, ... Xn) is sufficient, irregardless of whether it allows a reduction of the data or is a scalar, is in the mathematical statistics texts by Rohatgi (1976, pg. 339), Bickel and Doksum (1977, pg. 83) and Lehmann (1983, pg. 41), among others.   Fischer and Molenaar (1995) manage by saying what the particular sufficient statistic is (number correct score or sum score -- pages 10 and 16) or what other property is required for the given result (minimality and independence of some other statistic or from an exponential family -- pg. 25 and 222 respectively). --Bhabing 23:37, 26 September 2006 (UTC)


 * Let's define a statistic as being sufficient for a parameter $$\theta$$ iff the probability distribtion of the data conditional on the relevant statistic is not dependent on $$\theta$$. Now let's define $$\beta_{ni} = x_{ni}$$ for $$n=1,..,N$$ and $$i=1,..,I $$. If we were to condition on the entire data matrix, there is no probability distribution -- the data are fully determined. Therefore, the entire matrix cannot be a sufficient statistic according to that definition of sufficiency if parameters are supposed to enter into a stochastic model (and any other model is clearly inferior as far as recovering the data is concerned, if that is the criterion). I'm not sure what you think Fischer and Molenaar "manage". It seems to me you think there is some problem with Rasch models and the epistemological case put forward for the models. As far as these models are concenred, the point is that the person and item parameters are separable, which leads to sufficiency of total scores (or sometimes vectors as in Andersen, 1977). I can refer to various articles to make these points (most importantly Rasch, 1961), but I fear we'll just go around in circles all day.


 * I'll do some library diving when I get time to see what the authoritative sources in sufficiency's home field (mathematical statistics) say on the matter beyond the references I gave above, and get back to you. As far as Rasch measurement, I am hoping my feelings about Rasch measurement (pro and con) don't harm my attempts to add to this wiki any more than yours stop you. In my experience, most IRT researchers appreciate both the philosophical and statistical properties of the Rasch models as well as the need to deal with a wide variety of actual data sets.  --Bhabing 03:39, 27 September 2006 (UTC)


 * Fair enough. Keep in mind though that the concept of sufficiency is due to Sir Ronald Fisher and Rasch studied and worked with Fisher directly. Keep in mind also it ceases to be a purely mathematical matter where it comes to models used for empirical data. There is a quote from Rasch about this. Would you mind e-mailing me using the wiki function? Couple of things I want to mention but don't want to congest the board. Thanks for the cooperative spirit. Holon 05:17, 27 September 2006 (UTC)

What works best for you in editing this part of the wiki? Should I post some proposed changes here in the discussion first for your modification, or would it be easier for if I scan-mailed you the two pages of Thissen and Wainer (if you don’t have a copy available) and let you take the first go? --Bhabing 02:30, 26 September 2006 (UTC)


 * The problem with your citation was that it was entirely unclear what point was being made. Could you please just clarify the point in light of this discussion? Be bold in editing -- let's just have a discussion if you want to actually remove points that are being made, rather than add counterpoints. I'm open to any alternatives for clarification. Holon 03:31, 26 September 2006 (UTC)


 * Thanks! --Bhabing 23:37, 26 September 2006 (UTC)

The Rasch section was ridiculously disproportionate and I have reduced it to a few short paragraphs that say THE SAME THING (IMHO). If anyone thinks I cut too much, I would suggest that you consider creating a new entry to house an in-depth discussion.

BTW, the worse sin of this section is that I am not sure, after studying it, how Rasch is "a completely different approach" as claimed in the article. In my edits, I have tried to emphasize the theoretical differences that would lead one to apply IRT or Rasch modeling--I think those are the aspects that are "completely different"--because "approach" could encompass so many things where Rasch is virtually identical to the 1PL, such as: having a one-parameter logistic function to describe responding; assuming a latent trait that explains responses; wanting "good" measurement; valuing what Rasch called specific objectivity which (IMO) is subsumed under the IRT assumptions of local item independence and subpopulation parameter invariance; etc.

Amead (talk) 21:57, 23 November 2011 (UTC)

Clarity
This article is not appropriate for an encyclopedia entry. I have a degree in psych and it is incomprehensible. It is jargon from beginning to end. I looked up the entry to find out what IRT meant. I haven't a clue. The people on this talk page are happy with it. They are evidently members of an esoteric circle. Talk plain English or give an example - or something.

- Pepper 150.203.227.130 06:30, 12 January 2007 (UTC)


 * Did you use any quantitative methods in your degree? The reason I ask is that IRT is quite different from traditional quantitative methods taught in psych, and sometimes this makes it harder rather than easier to have some background. Whatever the case, though, I value your feedback. Some of the article needs to be technical -- it is by definition a body of theory. However, the basic purpose and concepts can be made clearer, and I for one am open to suggestion and input. In order to begin somewhere, does the first sentence not make some sense for you?
 * Incidentally, if you respond, I'll move the discussion to the bottom to keep things in chronological order, so please look for any responses at the bottom. If you don't respond, I'll move it after a few days. I'd also ask you to keep in mind constructive criticism and input is productive, whereas emotive language tends to obstruct productive communication. Cheers. Holon 10:57, 12 January 2007 (UTC)


 * I agree. The language is quite simple for someone with a quantitative background, which is necessary to understand IRT.  Having a degree in psych won't help much.  That's like saying a BA in Biology will help you understand meta-analysis of public health studies.Iulus Ascanius (talk) 16:12, 17 April 2008 (UTC)

I came to this page because I was reading articles about standardized testing. IRT was mentioned, but not explained in the articles I found. I'm an educated professional, used to reading technical articles. I feel sure that somebody could describe IRT in a way that a lay person can understand. This wikipedia entry does not serve that function. I am not sure who this entry is aimed at. It's not very useful to say that you have to be expert in the subject before you can understand the description here. As far as I can understand it, tests are graded using information (extracted from the test results as a whole) about 1) the difficulty of the question and 2) the ability of the student. It is not clear to me whether a student's grade is affected by his/her own performance. Does a stronger student get a better or worse score than a weaker student with the same test answers? If not, why use the students' ability as a parameter? Sorry, I just don't get it, and it is not explained here. I don't expect an answer on this talk page, but if anybody cares, they could edit the main article. Thanks Scientist2516 (talk) 22:55, 7 September 2014 (UTC)scientist2516

Technical language
- Kania 72.139.47.78 22:04, 24 February 2007 (UTC)


 * I happened to be investigating computer image processing and the links brought me to this page. The first paragraph is really incomprehensible to someone outside of the field. I initially thought that it was related to determination of the size of objects in an image. While it doesn't apply to me, I just thought that I would provide some comment to highlight the confusion that a layman might encounter in trying to understand the content.


 * In the following copy of the first paragraph from the article, the last most technical paragraph is actually the most easy to understand. I would explain what "item" is because the term is too generic. I would replace scaling with rating if that is what is meant. In the first sentence alone, the use of "items" twice with potentially different meanings is particularly confusing and scaling "items" based on their responses is a just nonsensical.


 * "Item response theory (IRT) is a body of related psychometric theory that provides a foundation for scaling persons and items based on responses to assessment items. The central feature of IRT models is that they relate item responses to characteristics of individual persons and assessment items. Expressed in somewhat more technical terms, IRT models are functions relating person and item parameters to the probability of a discrete outcome, such as a correct response to an item."


 * Thanks for the comments, much appreciated. We'd better work on tightening up. It should be at least obvious what it is and what it is used for in the simplest possible terms. Holon 12:55, 27 February 2007 (UTC)
 * BTW, just to clarify a couple of things -- scaling items is not nonsensical if you understand the process of scaling (estimating scale locations from responses to items). Rating is most certainly not the same as scaling. Holon 07:01, 28 February 2007 (UTC)

I had a problem with "scaling," since none of the other definitions seem to apply to the use given here. See: http://en.wikipedia.org/wiki/Scaling Could some one explain why correlating "data" with "traits" and "abilities" (which are also social constructs) is "scaling"?Gsmcghee (talk) 04:03, 16 December 2009 (UTC)

Equation
Is the equation for the three parameter logistic equation correct? It contains four parameters if you include the D parameter. This parameter is not discussed in the article nor does it appear in Baker's online book discussing the three-parameter model. Andrés (talk) 14:35, 17 April 2008 (UTC) (talk) 14:32, 17 April 2008 (UTC)


 * D is not an estimated parameter. It is a constant fixed to 1.0 or 1.702 to determine the scale.Iulus Ascanius (talk) 16:12, 17 April 2008 (UTC)

This previous comment makes no mathematical sense. Whether one uses $$a_i$$ or $$D*a_i$$ makes no difference. All one is doing is rescaling the value of the $$a_i$$ parameter. In other words, it cannot change the class of the function. The functions $$\sin(a_i*x)$$ and $$\sin(Da_i*x)$$ both describe a sine function. For every solution to the characteristic curve that uses the fixed value of 1.0 for D, there exists an equivalent representation that uses the value 1.702. There is no mathematical way to way to distinguish the two. Can someone provide a reference to the significance of the D parameter/constant? It still seems incorrect to me. A random selection of an article from the web, such as |A Comparison of Item-Fit Statistics for the Three-Parameter Logistic Model, makes no mention of this scaling constant. Andrés (talk) 05:52, 21 April 2008 (UTC)


 * That's the point, that is nothing more than a small rescaling to help the logistic function more closely approximate a cumulative normal function. Here's a good reference, though it's not available through ERIC: Camilli, G. (1994). Origin of the scaling constant "D" = 1.7 in item response theory. Journal of Educational and Behavioral Statistics, 19(3), 293-295.  If you need papers immediately available on the internet that mention but do'nt really explain, see www.fcsm.gov/05papers/Cyr_Davies_IIIC.pdf, http://harvey.psyc.vt.edu/Documents/WagnerHarveySIOP2003.pdf, http://www2.hawaii.edu/~daniel/irtctt.pdf.  Any IRT book will explain it.

Thank you for the references to the use of the D constant. I now understand the issues better. The purpose of the constant is to make the item's charecteristic curve look like that of the CDF of the normal distribution by rescaling the ability scale. The entry in the article is still wrong, however. This rescaling is appropriate for the 2PL model but not for the 3PL model whenever $$c_i > 0$$. As stated latter in the article, the use of $$D=1.7$$ makes the characteristic curve of the 2PL differ by less than $$0.1$$ from the that of the normal CDF. This bound is broken by any fitted value of $$c_i > 0$$. For example, in a multiple choice exam question even random guessing will pick the correct answer. As I understand IRT, that is why the 3PL is used. In a typical exam with five choices, one would expect the fitted value of $$c_i$$ to be 0.2 or higher. For negative values of the responder's ability, any 3PL with $$c_i > 0$$ will not be close to the normal CDF no matter what value of D is picked. As support for my argument that the D constant does not belong in the 3PL equation, I point out that the Camilli reference given in the previous paragraph only talks about the 2PL model. Andrés (talk) 14:32, 22 April 2008 (UTC)

Normal Ogive Equation
A definition of phi would be useful in the normal ogive equation.Gshouser (talk) 22:01, 2 September 2009 (UTC)

Missing Critique?
Why is the history of the controversies and critiques of IRT missing? I'm sure philosophers have attacked the basic assumptions of IRT, but nowhere is this evident.

It is almost as if these mathematical insights fell straight from heaven, without any social or political contexts to guide their subsequent development. The inclusion of some related background on eugenics, educationism, and social progressivism, as it relates to IRT, would be helpful for those struggling with this article.

In fact, as "Clarity" points out, it is difficult to understand what the IRT article is about. It is, quite simply, an article by statisticians for other statisticians, and not a general audience.

The word "basis" in the introduction needs some explication -- why response-data would or could be correlated with "traits" or "ability" requires the acceptance of many assumptions, but this is not treated in the article. —Preceding unsigned comment added by Gsmcghee (talk • contribs) 03:55, 16 December 2009 (UTC) Gsmcghee (talk) 04:37, 16 December 2009 (UTC) Gsmcghee (talk) 04:38, 16 December 2009 (UTC)


 * These seem valid points ... why not improve things? 11:16, 16 December 2009 (UTC)

There also seems to be a mis-characterization of the Rasch-IRT controversy, minimizing the differences. I have quotes by Andrich and others that I will try to incorporate. The main thrust seems to be that IRT essentially muddies the clarity of the Rasch separability theorem by adding extraneous parameters designed to second guess test takers and item creators alike, making it Raschian separability impossible.

66.32.132.219 (talk) 04:16, 17 December 2009 (UTC)


 * Well, philosophers take little interest in IRT because philosophy is neither quantitative or scientific. Moreover, you are confusing psychometrics with psychological testing.  Associating IRT with eugenics is like associating the physics of the internal combustion engine with drunk driving.  And complaining about relevant jargon by throwing in irrelevant jargon is hardly compelling.  Lastly, why wouldn't response data be associated with its trait?  We assume that the response to an algebra question is related to algebra knowledge; not exactly far-fetched.


 * I think the Rasch-IRT "controversy" is already overemphasized. If anything, we need more info about how the Rasch method is self-serving, inadequate, and unscientific.Iulus Ascanius (talk) 16:55, 17 December 2009 (UTC)

Well, I didnt realise I had such a vested interest in this issue, but after reading some comments in this Missing Critique section, I think the following point should be made. The Rasch model, as with any other IRT model, is a theory of how tests _should be_ constructed -- it tells us what statistical properties test items should have and then, on the basis that they have those properties, allows for the evaluation of individual's performance (relative to other individuals) on such items.

To see the relation between Rasch modeling and other domains of IRT, consider for example the case when items with response bias (DIF) are removed from a test. In principle, this is not any different than constructing a test to have items with uniform discrimination (which is all a Rasch model is). The basic idea is that and IRT model tells us what a test is _supposed to do_, and then tests are constructed (i.e., items are selected) so that they meet the assumptions of the model. The model is inherently normative, because educational evaluation is inherently normative.

What you guys seem to be arguing about is education at large, and are grinding your axe on a theory of educational testing that is probably one of the best things to happen to the regulation of educational testing in the US. I say this because if a test actually does what an IRT theory says it does, then the ability estimates (i.e., people's test scores) based on that model are an accurate reflection of people's relative standing on the test. In terms of gate-keeping, this is better than, say, blatant upperclass chauvinism, right?

Also, to say that this article is written for statisticians is ridiculous -- its mathematical content is bare-bones. Really it seems that this article should be separated into two parts, one for social implications, one for statistical details. 145.18.152.249 (talk) 16:28, 9 February 2010 (UTC)

My recent revision
I've changed a passage in the article to read as follows:
 * An alternative formulation constructs IRFs based on the normal probability distribution; these are sometimes called normal ogive models. For example, the formula for a two-parameter normal-ogive IRF is:
 * An alternative formulation constructs IRFs based on the normal probability distribution; these are sometimes called normal ogive models. For example, the formula for a two-parameter normal-ogive IRF is:

p_i(\theta)= \Phi \left( \frac{\theta-b_i}{\sigma_i} \right) $$
 * where &Phi; is the cumulative distribution function (cdf) of the standard normal distribution.
 * where &Phi; is the cumulative distribution function (cdf) of the standard normal distribution.

Nowhere earlier in the article did it say what &Phi; is, and yet it didn't say here either! Why not?? Before my recent edits the article said
 * $$ p_i(\theta)= \Phi \cdot \frac{\theta-b_i}{\sigma_i}

$$

That didn't appear to make sense. It looks like the kind of thing someone might write if they were dutifully copying the formula and misunderstood it, rather than explaining something they understood.

Have I misunderstood this? Michael Hardy (talk) 05:15, 8 February 2010 (UTC)

Hi, I am not a frequent wiki user but I came accross this article and wanted to draw some things to your attention. Firstly I think you have forgotten to mention some important historical developments in IRT. In particular, the 2PL and 3PL models are attributable to Birnbaum's sections of Lord and Novick's Statistical Theories of Mental Test Scores -- I am not sure how you could miss this and at the same time be concerned with issues that are fifty years old, such as comparing CTT and IRT. Another hisotorical point is that Lazarsfeld's models dealt with categorical latent variates and are typically discussed under the rubric of 'latent class analysis' (see e.g., Bartholemew and Knott, 1999). Arguably these are a very different kind of model than those found in IRT. I mean, if you are going to discuss Lazarsfeld's work in connection with IRT, you might as well also discuss Spearman's work on unidimensional factor analsyis as "pioneering work in IRT". The point is that these are not conventionally treated as the same classes of models. The class of models conventionally treated as IRT models are just those introduced by Birnbaum, which include the Rasch model as a special case.

Some technical points: the article doesnt state _anything_ about estimation and the treatment of model fitting is very unsatisfactory -- these issues have been the core of IRT since the 1970s. I would recommend that if this article is to reflect the modern state of IRT it be largely re-written to include the topics discussed in, e.g.,

Baker, F. B. & Kim, S. (2004). Item Response Theory: Parameter Estimation Techniques(2nd ed.). New York: Marcel Dekker Inc.

FYI in your recent revision of the normal ogive model, the parameter sigma_i is more often written as a_i = 1 /sigma_i and termed the 'discrimination parameter' of the item.

145.18.152.249 (talk) 15:55, 9 February 2010 (UTC)

Comments on the overview
IRT did not become widely used until the late 1970s and 1980s, when personal computers gave many researchers access to the computing power necessary for IRT.

There is no connection between the rise of personal computers and the rise of IRT. Besides, pretty much no one had personal computers in the late 70s.

Birnbaum's work made IRT feasible on main frame computers by the early to mid 60's, but there was little reason to add the complexity of IRT to large-scale programs. The first large-scale program that used IRT was TOEFL around 1976-77, which had to use IRT equating methods because cheating was too prevalent to use common item classical approaches. Use of IRT got a huge boost by New York State's "Truth in Testing" legislation that threatened to derail common item and equivalent form equating approaches. Almost all operational IRT work was done using mainframe computers until maybe the mid 80's. Desirability to reduce testing time by using adaptive testing models further boosted the use of IRT with work on the ASVAB and the the College Board's course placement tests (operational circa 1984).

''IRT entails three assumptions:

A unidimensional trait denoted by θ ; Local independence of items; The response of a person to an item can be modeled by a mathematical item response function (IRF).''

IRT has only two assumptions - local independence and form of the item response function. Local indepence subsumes unidimensionality in the case of unidimensional IRT models. Local independence is tanamount to saying the dimensionality of the data matches the dimensionality of the model. See Lord and Novick 16.3 page 361 in the brown hard-covered version).

NealKingston (talk) 22:12, 28 July 2010 (UTC)

Intelligence citations bibliography for updating this and other articles
You may find it helpful while reading or editing articles to look at a bibliography of Intelligence Citations, posted for the use of all Wikipedians who have occasion to edit articles on human intelligence and related issues. I happen to have circulating access to a huge academic research library at a university with an active research program in these issues (and to another library that is one of the ten largest public library systems in the United States) and have been researching these issues since 1989. You are welcome to use these citations for your own research. You can help other Wikipedians by suggesting new sources through comments on that page. It will be extremely helpful for articles on human intelligence to edit them according to the Wikipedia standards for reliable sources for medicine-related articles, as it is important to get these issues as well verified as possible. -- WeijiBaikeBianji (talk, how I edit) 02:10, 2 September 2013 (UTC)

Assessment comment
Substituted at 19:12, 29 April 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 3 external links on Item response theory. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20041210140342/http://work.psych.uiuc.edu/irt/tutorial.asp to http://work.psych.uiuc.edu/irt/tutorial.asp
 * Added archive https://web.archive.org/web/20060613221419/http://www.b-a-h.com/software/irt/icl/ to http://www.b-a-h.com/software/irt/icl/
 * Added archive https://web.archive.org/web/20071211021313/http://assess.com/xcart/home.php?cat=37 to http://assess.com/xcart/home.php?cat=37

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 02:23, 18 November 2017 (UTC)