Talk:Educational assessment

Questions
I read this article with great interest. Whoever wrote it, contributed to it or knows something about the subject, I've got eight (yes, 8) questions (as always, they are real rather than rhetorical):

(1) Why are so many capital letters used? (Educational Evaluation, <-- wikified to Educational evaluation(Rfrisbietalk 21:54, 11 February 2006 (UTC)) Summative Assessment, etc.)

(2) In the first paragraph, it says that assessment "applies to many other areas as well", not just to testing. Well, what are they?

(3) "Objective assessment is a form of questioning which has a single correct answer". So what about a question like, "Name the titles of three novels by Who Ever"?

(4) "more than one current answer" should probably read "more than one correct answer"?

(5) What are "matching questions"?

(6) What are "extended response questions"? I think I know, and some readers will be able to infer the answer but shouldn't we mention it explicitly?

(7) What's wrong about oral exams, for example as part of a driving test?

(8) At one point in the article "poorly trained markers" are referred to. Is being a marker a full-time job or part of the job of a teacher? Who trains markers, and what do they have to know?

Maybe some of you can answer at least some of my questions.  17:12, Oct 12, 2004 (UTC)

Is there a particular reason why there is a section of the article written in Portuguese? Eponym 02:36, 19 May 2007 (UTC)

Related articles
I trimmed down the Grading section in the "Related articles section" I felt the author was going on a rant about why it was bad.68.231.62.73 19:50, 8 August 2005 (UTC)


 * do you think if we should continue having examinations? Because in somecase exam doesn't show the right ability of some students. —The preceding unsigned comment was added by 203.160.1.47 (talk • contribs).


 * Your question, anonymous user, is irrelevant to the topic. The merits of using a test is not worth mentioning on an article about tests. Chris53516 17:00, 15 September 2006 (UTC)

Characteristics of assessments
In the section "Characteristics of assessments" the third paragraph formerly read:

A good assessment is valid and reliable. Note that an assessment may be reliable but invalid or unreliable and invalid, but an assessment can not be unreliable and valid. In practice, an assessment is rarely completely valid or entirely reliable.

This is wrong in stating that "an assessment can not be unreliable and valid" and also extremely misleading in implying that reliable/unreliable and valid/invalid are accurate terms in discussing assessment. Instead it is usually much more useful to talk in terms of more reliable vs less reliable, and more valid vs less valid.

I've revised the paragraph

bill-morris@uiowa.edu, Biostatistician 31 mar 06, 4:13 CST

History section
This section is unreferenced and appears to be either plagiarism or original research. Can anyone re-write it, citing references, to improve the section? If it becomes any longer, it should probably be its own article. Chris53516 13:37, 29 September 2006 (UTC)

I have removed the history section due to a violation of the No original research policy and the admittance of original research by Bobbyelliott (Talk). Here is an archive of the removed History section.Chris53516 13:58, 29 September 2006 (UTC)


 * It is original research, taken from a book I am writing on assessment, offered as a donation to Wikipedia. It is chronologically inappropriate to move it to the end of the article. --Bobby 13:52, 29 September 2006 (UTC)

We appreciate your donation, but it does not constitute unbiased content. Please see the following policy: No original research. Chris53516 13:58, 29 September 2006 (UTC)

Content that deserves its own article
This content does not belong here, but in its own article. Chris53516 13:28, 24 October 2006 (UTC)

Aim
To investigate and support the transition from norm-referenced external language assessment to criterion-referenced school-based assessment in senior secondary English teaching in Hong Kong schools.

The Research Team Members
Dr Chris Davison, Prof. Liz Hamp- Lyons, Dr Albert Wong, Dr David Carless, Dr Gan Zhengdong, Dr. Matthew Clarke, Dr Steve Andrews, Ms. Geraldine Davies, Ms Rosina Tang, Miss Nicole Judith Tavares, Miss Yu Ying, Ms Wendy Leung, Ms Cathy Cheung, Miss Christy Poon, Miss S.Poon, Mr Hayes Hei Hang Tang, Miss Wong W.Y.Wong, Miss Bonnie Ng, Miss Pauline Lee, Mr Taft Wong, Mr Ken Yiu,

Aim
To investigate technology-enhanced assessment tools and methodologies, especially in the field of language learning and ICT literacy.

Research Team
Dr Angela Guimaraes Pereira, Mr Friedrich Scheuermann

Merge with Course evaluation etc
The material in Course evaluation is redundant and overlapping with this page. The term assessment is more widely used and is the broader category. Therefore, the Course evaluation material should be merged with Assessment. &mdash;Chris53516 (Talk) 16:39, 22 July 2007 (UTC)
 * Agree --Jeffmcneill talk contribs 22:35, 11 October 2007 (UTC)

Same goes for Educational evaluation and many other redundant pages with similar meanings... &mdash;Chris53516 (Talk) 05:05, 12 October 2007 (UTC)

We are asked:

''Should the Assessment article be merged with Course Evaluation and Educational Evaluation, both of which overlap markedly in content? ''

Most emphatically, they should not. Although, as things stand, it is obvious that essentially the same entry has been dropped into the Wikipedia under these different headings, these entries should in fact deal with very different topics.

Furthermore, as things stand, this entry and its relatives are overwhelmingly America-as-the-world centric and rides rough shod over many issues of vital concern.

Starting from the beginning.

We are offered a definition of “assessment” in which key words are knowledge, skills, attitudes and beliefs.

Yet the definition of “Assessment” in the Oxford English Dictionary begins: “1. The determination of the amount of a tax, fine, etc. ….”. It then goes on in a similar vein until it comes to: “4. … an estimate of worth”. There is no mention of knowledge, skills, or attitudes. No mention of feedback, formative or summative.

Clearly, whoever wrote this entry had a very specific perspective on the word/world!

This is more than unfortunate in that, both here and in the linked entries, a series of very different things are conflated.

These include:


 * 1) Assessments and evaluations of people using educational and psychological tests.
 * 2) Evaluations of the effectiveness of eg educational and health care (psychotherapy; drugs based) programmes using educational and psychological tests.
 * 3) Social Surveys:
 * Evaluations of public provision, such as housing or planning policy, educational policy, based on opinion surveys, ability to meet pre-specified targets, etc
 * Evaluations of organisational functioning conducted in terms of theoretically based constructs such as organisational climate and classroom climate.
 * Assessments of managers’, teachers’, or organisation’s ability to meet targets set by others (eg politicians).
 * Opinion based evaluations of courses in eg higher education.
 * 1) Evaluations designed to provide guidance to individuals: multiple content surveys designed to facilitate choice of career (careers guidance); diagnostic surveys designed to to enable people to identify and remedy deficits in intellectual understanding or health care.
 * 2) Non social-science based evaluations of procedures - such as evaluations of the effects of fertilisers on agricultural yields and the food chain.
 * 3) Computerised diagnostic programs used to collate seemingly independent symptoms in order to identify core defects in eg car or electronic equipment or human or animal psychological or physiological functioning (health care).

In some cases, the information collected is intended to enable those responsible for the provision …. whether defence policy, health care policy, agricultural policy, or the provision of courses …. to improve their performance.

In other cases it is intended to enable some authority to make some instrumental decision - such as whether, eg in an occupational or educational setting, an individual should be hired or fired (or admitted to some course) or a particular type of educational or community development activity as a whole continued or terminated.

As the current entry explains, in the educational and psychological area, the former are sometimes referred to as “formative” assessments; the latter as “summative”.

In all cases, it is “obvious” that these things can be better done via the Web. Some examples:


 * 1) Psychological tests can be centrally scored to yield hard to calculate “profiles” of scores that can be fed back to individuals or cumulated across multiple respondents.
 * 2) Reference material needed to contextualise and give meaning to individual test scores (for example) can be easily accumulated.
 * 3) Students can complete course evaluation questionnaires on-line at home instead of taking up course time.

Unfortunately, these benefits are not always achieved:


 * 1) Effective web-based testing of eg job applicants depends on availability of secure testing sites to counteract cheating and collect fees.
 * 2) Production of profiles often depends on utilising a variety of assessment procedures (eg IQ tests, personality tests, guidance questionnaires, measures of physiological functioning) developed by different organisations and publishers and, in many cases, channelling the licence fees for the different tests that have been incorporated into a battery back to those who developed them.
 * 3) Those who complete web based surveys do not constitute a genuinely representative sample - cross-section - of the population. Consequently, the cumulated data do not yield appropriate reference data.
 * 4) Mass testing … eg of all children aged 8, 11, or 14 in a particular country … depends on widespread availability of terminals and massive central processing capability.
 * 5) It is often difficult to administer appropriate tests in a computerised form.
 * 6) Students tire of completing course evaluation forms … and come to feel they have no effect … so the feedback comes only from, eg, those who are most dissatisfied with the courses

These, however, are usually the least of the problems that crop up in assessment, whether web-based or not. More important problems include:

The absence of appropriate measures of the most important outcomes. Examples are legion. The main goals of education include the development of initiative, self-confidence, the ability to understand and influence the workings of organisations and society, the ability to put people at ease. There are no good measures of such outcomes. So the tendency is, in study after study, to focus on what is “easy” to measure (eg reading or “scientific” ability [although most of what are assumed to be measures of these things themselves often lack construct and predictive validity]). This then diverts the attention of teachers and administrators from what is important to these things. It diverts attention from pupils to evaluation procedures. Still more importantly, the goals of education include the nurturance of diversity: helping each pupil or student to identify, develop, and get recognition for his or her own particular strengths. Different people therefore develop in different directions which cannot be detected with any common test (see Hughes, 2008; Boben, 2008; Stevenson, 2008 and Duzen, 2008). In agriculture, much more important than the effects of particular fertilizers or pesticides on short term yields are long term yields and, still more importantly, effects on the food chain. In societal development, much more important than meeting most economic targets are outcomes such as the assessment of how different components of quality of life are affected by a variety of ways of utilising resources and, still more importantly, the long term sustainability of the society, the species, and the planet.

The hegemony of reductionist and positivistic thinking in “science” . As indicated in the last paragraph, assessments that are going to be useful to guide development, whether in agriculture, health care, personal and social development, societal development, need to be comprehensive. They need to cover all the physical, biological, personal and social, short and long term consequences of the action, whether desired and desirable or undesired and undesirable. As [Vandana Shiva] (1998) in particular, has argued, this is flatly at variance with our societal preoccupation with reductionist science … which is concerned mainly with the effects of single variables on single easy to measure outcomes. Changing a pervasive image of science and how data are to be managed is thus a pre-requisite to obtaining meaningful and useful assessments.

The corruption of well intentioned action into its opposite through poorly understood social processes. Assessments promoted as means of obtaining feedback are, it seems, almost invariably corrupted back into forms of central command and control management. Thus as Court (2008) has shown, adoption of any one of the plethora of formats supposedly available to enable lecturers to obtain feedback on their courses tends to result in schemes to hire, fire, and control lecturers. This comes about because (a) the generic forms are unrelated to the wide variety of aims that different lecturers may have. Thus one may be concerned to promote the development of diverse forms of self-confidence. Another to nurture the ability to problematise, invent ways of checking emergent hypotheses, and check conclusions. And so on. Yet the generic forms simply ask students such things as whether the case study materials were well organised … not such things as whether they were able to identify and develop their own specific talents … which, in any case, may not have been among the goals of a specific lecturer. Consequently, lecturers are unable to use the feedback. Conversely, students find (i) that they cannot say what they want to say and (ii) that what they do say has no effect. The results of all this are that student response rates fall below 20%. Results obtained from such small and unrepresentative samples are clearly invalid. Worse, they are invalid in the sense that they do not in any sense index “lecturer performance”. Yet they are cumulated and used to contribute to decisions about whether to hire, fire, or promote lecturers. Yet this is a microcosm of what happens in the vast mandatory multi million dollar international high stakes testing program where children are tested at ages 8, 11, and 14 (or some equivalent). As already indicated, the measures used do not reflect important outcomes of education and themselves lack construct and predictive validity. But what happens next is horrific. As Hattie better than anyone else has shown, these vast studies are not designed to yield information that enables teachers or school systems to improve their performance. All they can be used for is to create school Olympics and castigate individual teachers, schools, and school systems as “inefficient”. School days get filled up with testing and teaching to the test. Teachers fiddle their targets. And so on. Similar processes can be observed in most command and control organisations and society (Deming, 1993; Morgan,1986; Raven, 1995).

There are few arrangements for follow-through activity. To do something about the previously mentioned problems of the educational system or economic management, governments would have invest heavily in the development of alternative assessment procedures based on, eg, new psychometric models. Worse, they would have to invest in the development of a swathe of diagnostic and prescriptive instruments. In education, these would have to range from eg the development of tools to assist in the diagnosis and remediation of the cluster of very different problems subsumed under the heading “dyslexia” (and in research to understand that process). They would have to invest in tools to assess “classroom climate” and its effects on the development of different competencies and perceptions. They would have to invest in means of studying the variety of different organisational arrangements that are required to create the pervasive climate of innovation, experimentation, and assessment of outcomes that are required to move forward (Raven, 1994).

''Most models of formative-assessment-and-reform assume that those “responsible” are in a position to “do something” with the results. This is rarely the case''. Most human behaviour is determined by factors beyond the individual’s control … and there has been little study of these socio-cybernetic forces which force most of us to spend most of our time doing things we know to be wrong. What happens in schools, for example, is not determined by the educational priorities of teachers, pupils, parents, ministers of education, or educational philosophers. It is determined by what is assessed in the sociological process of allocating position and status … at the point of interface between schools and society. But one cannot alter that process without altering what goes on in schools so that other outcomes become visible and assessable. And that means creating variety and choice between schools. Experimentation, evaluation, and public debate. But that means changing widespread beliefs about how society works … Funding adventurous research … finding ways of thinking about and doing things that no one in government would think it was important to do … thus becomes a fundamental prerequisite to conducting useful assessments.

Governments, far from wanting to promote the kinds of development indicated above, strive, above all, to control information. Step by step, governments, all over the world, have acted to prevent the development of understanding of the issues mentioned above and the development of appropriate assessment procedures. Researchers are required to bid for government contracts to conduct research with pre-specified terms of reference and may not investigate anything else. Their continued tenure depends on completing and publishing the results of that research to an absurd time scale. Yet they may not say anything without specific government approval. And the government retains the right to, and does, alter the figures … the statistics … obtained from assessments of eg pupils’ reading ability (assessed by tests which have already been engineered to avoid measuring any form of reading ability worth the name.).

The bottom line to all this is that what is said in the main entry is mostly pseudo-science; scientific gobbled-gook. It seeks to present a “professional” image that has no foundation and which, like so much else that has been mentioned, has precisely the opposite effect to what is claimed. Testing and assessment of the kind promoted drives education out of schools and does profound damage to individuals and society (see Raven 1991). Nothing could be more unethical. Except, perhaps, the antics of politicians, economists, and mediaeval priests and physicians.


 * Bardutzky, D.B. and Boben, D. (2008) Improving Standards of Assessment: The Need for a Paradigm Shift cannot be Demonstrated from within that Paradigm. Paper presented to conference of the International Test Commission.
 * Court, J.H. (2008) Objectives and Measurement in Formative Evaluation in Higher Education. Paper presented to conference of the International Test Commission.
 * Deming, W. E. (1993). The New Economics for Industry, Government, and Education. Cambridge, MA: Massachusetts Institute of Technology.
 * Duzen, E. (2008) Cultural and other Constraints on the Validity of Assessments. Paper presented to conference of the International Test Commission.
 * Fletcher, R. and Hattie, J. (2003) Test the Nation: The development of an IQ test for New Zealand adults. University of Auckland: School of Education. http://images.tvnz.co.nz
 * Hattie, J. (2003). Teachers Make a Difference. What is the research Evidence? University of Auckland: School of Education.
 * Hughes, S.J. (2008). Arbitrary Metrics Fail to Capture Critical Variance in Human Development. Paper presented to conference of the International Test Commission.
 * Kazdin, A. E. (2006). Arbitrary metrics: Implications for identifying evidence-based treatments.  American Psychologist, 61, 42-49.
 * Morgan, G. (1986). Images of Organization. Beverly Hills, CA: Sage.
 * Raven, J. (1995).  The New Wealth of Nations: A New Enquiry into the Nature and Origins of the Wealth of Nations and the Societal Learning Arrangements Needed for a Sustainable Society. Unionville, New York: Royal Fireworks Press; Sudbury, Suffolk: Bloomfield Books.
 * Raven, J. (1994). Managing Education for Effective Schooling: The Most Important Problem Is to Come to Terms with Values. Unionville, New York: Trillium Press.
 * Raven, J. (1991). The Tragic Illusion: Educational Testing. New York: Trillium Press
 * Shiva, V. (1998). Biopiracy: The Plunder of Nature and Knowledge. London: Green Books.
 * Stephenson, J. (2008) Problems involved in arriving at valid, reliable, and objective evaluations of development-oriented educational activities in Higher Education. Paper presented to conference of the International Test Commission.Quester67 (talk) 10:31, 2 January 2008 (UTC) —Precedingunsigned comment added by Quester67 (talk •contribs) 10:22, 2 January 2008 (UTC)

Assessment / evaluation
I don't believe that they should be merged. An assessment is a measure at a particular point in time. Evaluation is a reflective measure. I use both assessment and evaluation in my work. In the ways that they are used in our establishment they are completely separate issues.I think it depends on what your role is. As a teacher you may well be using them for the same end, but the administration and management side of education would be evaluating in a totally different way to the teacher.

I think that maybe the Course Evaluation page is incorrectly defined. If I were conducting a course evaluation as a manager, it would be more about retention and achievement, cost/beneifts, whether to run it again, etc than whether or not the learner felt they had achieved

Comment left above Table of Contents

 * Moved by Adavis444 (talk) 08:39, 29 July 2010 (UTC)

I came to this page to learn about the legal concept of assessments upon real property and was dissapointed to discover that this page only dealt with educational assessments, with only a footnote stating that as an alternative definition, an assessment can deal with tax liens. Why isn't there a disambiguation page that splits up the educational form of assessment with the legal term of assessment? —Preceding unsigned comment added by 160.36.251.164 (talk) 18:50, 23 March 2009 (UTC)

How do people feel about IMS QTI?
It would be nice if IMS QTI could handle pronunciation assessments.HowDoIUseUnifiedLogin? (talk) 21:55, 1 August 2009 (UTC)

Ipsative
This article states: "Ipsative assessment is self comparison either in the same domain over time, or comparative to other domains within the same student." But that doesn't appear to be the difinition (temporal comparison) as in the link to Ipsativehttp://en.wikipedia.org/wiki/Ipsative_assessment —Preceding unsignedcomment added by Davidakoontz (talk •contribs) 22:10, 16 October 2009 (UTC)

Psychological Assessment
This article seems to be biased towards educational assessment, which is certainly one type of assessment, but not the only. Psychological assessment is another. Perhaps the article should take into consideration that the broad notion of assessment, or should have disambiguation to point to a new article on psychological assessment, for example. --1000Faces(talk) 00:17, 18 October 2009 (UTC)

Intelligence Citations Bibliography for Articles Related to IQ Testing
I have posted a bibliography of  Intelligence Citations for the use of all Wikipedians who have occasion to edit articles on human intelligence and related issues. I happen to have circulating access to a huge academic research library at a university with an active research program in those issues (and to another library that is one of the ten largest public library systems in the United States) and have been researching these issues since 1989. You are welcome to use these citations for your own research and to suggest new sources to me by comments on that page. --WeijiBaikeBianji (talk) 20:05, 30 June 2010 (UTC)

Alternate Meanings Needs Revision
I don't believe it is good practice to refer to a different reference collection to try to explain alternative notions of a concept. For understanding assessment, this section should either cite sources of how people have conceptualized alternative definitions of educational assessment, or situate educational assessment in the general area of assessment (be that psychological or otherwise). This can be done by referring to other Wikipedia articles as well as outside sources.

For the time being, does anyone have thoughts about how to best replace the Mirriam-Webster definition while a more thorough "alternative meanings" section may be constructed? Mattsenate (talk) 20:12, 30 October 2010 (UTC)

Proposed merge with Learning outcomes assessment
There is little info here, and it would all make more sense in the context of the page on the varying forms of "Educational assessment" Nat Gertler (talk) 17:22, 4 March 2016 (UTC)
 * Agreed, I would suggest merging it to this article too. LoudLizard (📞 | contribs | ✉) 17:55, 4 March 2016 (UTC)
 * I've stricken the suggestion, as the other article has been moved out of article space, and apparently was only placed there accidentalike.

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Educational assessment. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20120529041906/http://www.nasponline.org/publications/cq/pdf/V38N7_CulturallyCompetentAssessment.pdf to http://www.nasponline.org/publications/cq/pdf/V38N7_CulturallyCompetentAssessment.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 15:24, 4 September 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Educational assessment. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20090226164716/http://www.edtech.vt.edu/edtech/id/assess/purposes.html to http://www.edtech.vt.edu/edtech/id/assess/purposes.html
 * Added tag to http://dailybruin.ucla.edu/stories/2003/mar/18/reform-education-not-exit-exam/

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 19:44, 17 September 2017 (UTC)

Merge proposal
General principles of assessment is an overlapping stub, where the topic is best discussed here (Educational assessment). I therefore suggest a merge in. Klbrain (talk) 10:46, 7 September 2021 (UTC)
 * Agree, a main article on assessment/measurement should definitely elaborate on the general principles of assessment. Moreover, the principles listed are not the most authoritative available... Sda030 (talk) 21:43, 13 February 2022 (UTC)
 * ✅ Klbrain (talk) 14:04, 28 September 2022 (UTC)