Writing assessment

Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning. An important consequence of writing assessment is that the type and manner of assessment may impact writing instruction, with consequences for the character and quality of that instruction.

Contexts
Writing assessment began as a classroom practice during the first two decades of the 20th century, though high-stakes and standardized tests also emerged during this time. During the 1930s, College Board shifted from using direct writing assessment to indirect assessment because these tests were more cost-effective and were believed to be more reliable. Starting in the 1950s, more students from diverse backgrounds were attending colleges and universities, so administrators made use of standardized testing to decide where these students should be placed, what and how to teach them, and how to measure that they learned what they needed to learn. The large-scale statewide writing assessments that developed during this time combined direct writing assessment with multiple-choice items, a practice that remains dominant today across U.S. large scale testing programs, such as the SAT and GRE. These assessments usually take place outside of the classroom, at the state and national level. However, as more and more students were placed into courses based on their standardized testing scores, writing teachers began to notice a conflict between what students were being tested on—grammar, usage, and vocabulary—and what the teachers were actually teaching—writing process and revision. Because of this divide, educators began pushing for writing assessments that were designed and implemented at the local, programmatic and classroom levels. As writing teachers began designing local assessments, the methods of assessment began to diversify, resulting in timed essay tests, locally designed rubrics, and portfolios. In addition to the classroom and programmatic levels, writing assessment is also hugely influential on writing centers for writing center assessment, and similar academic support centers.

History
Because writing assessment is used in multiple contexts, the history of writing assessment can be traced through examining specific concepts and situations that prompt major shifts in theories and practices. Writing assessment scholars do not always agree about the origin of writing assessment.

The history of writing assessment has been described as consisting of three major shifts in methods used in assessing writing. The first wave of writing assessment (1950-1970) sought objective tests with indirect measures of assessment. The second wave (1970-1986) focused on holistically scored tests where the students' actual writing began to be assessed. And the third wave (since 1986) shifted toward assessing a collection of student work (i.e. portfolio assessment) and programmatic assessment.

The 1961 publication of Factors in Judgments of Writing Ability in 1961 by Diederich, French, and Carlton has also been characterized as marking the birth of modern writing assessment. Diederich et al. based much of their book on research conducted through the Educational Testing Service (ETS) for the previous decade. This book is an attempt to standardize the assessment of writing and is responsible for establishing a base of research in writing assessment.

Validity and reliability
The concepts of validity and reliability have been offered as a kind of heuristic for understanding shifts in priorities in writing assessment as well interpreting what is understood as best practices in writing assessment.

In the first wave of writing assessment, the emphasis is on reliability: reliability confronts questions over the consistency of a test. In this wave, the central concern was to assess writing with the best predictability with the least amount of cost and work.

The shift toward the second wave marked a move toward considering principles of validity. Validity confronts questions over a test's appropriateness and effectiveness for the given purpose. Methods in this wave were more concerned with a test's construct validity: whether the material prompted from a test is an appropriate measure of what the test purports to measure. Teachers began to see an incongruence between the material being prompted to measure writing and the material teachers were asking students to write. Holistic scoring, championed by Edward M. White, emerged in this wave. It is one method of assessment where students' writing is prompted to measure their writing ability.

The third wave of writing assessment emerges with continued interest in the validity of assessment methods. This wave began to consider an expanded definition of validity that includes how portfolio assessment contributes to learning and teaching. In this wave, portfolio assessment emerges to emphasize theories and practices in Composition and Writing Studies such as revision, drafting, and process.

Direct and indirect assessment
Indirect writing assessments typically consist of multiple choice tests on grammar, usage, and vocabulary. Examples include high-stakes standardized tests such as the ACT, SAT, and GRE, which are most often used by colleges and universities for admissions purposes. Other indirect assessments, such as Compass, are used to place students into remedial or mainstream writing courses. Direct writing assessments, like Writeplacer ESL (part of Accuplacer) or a timed essay test, require at least one sample of student writing and are viewed by many writing assessment scholars as more valid than indirect tests because they are assessing actual samples of writing. Portfolio assessment, which generally consists of several pieces of student writing written over the course of a semester, began to replace timed essays during the late 1980s and early 1990s. Portfolio assessment is viewed as being even more valid than timed essay tests because it focuses on multiple samples of student writing that have been composed in the authentic context of the classroom. Portfolios enable assessors to examine multiple samples of student writing and multiple drafts of a single essay.

Methods
Methods of writing assessment vary depending on the context and type of assessment. The following is an incomplete list of writing assessments frequently administered:

Portfolio
Portfolio assessment is typically used to assess what students have learned at the end of a course or over a period of several years. Course portfolios consist of multiple samples of student writing and a reflective letter or essay in which students describe their writing and work for the course. "Showcase portfolios" contain final drafts of student writing, and "process portfolios" contain multiple drafts of each piece of writing. Both print and electronic portfolios can be either showcase or process portfolios, though electronic portfolios typically contain hyperlinks from the reflective essay or letter to samples of student work and, sometimes, outside sources.

Timed-essay
Timed essay tests were developed as an alternative to multiple choice, indirect writing assessments. Timed essay tests are often used to place students into writing courses appropriate for their skill level. These tests are usually proctored, meaning that testing takes place in a specific location in which students are given a prompt to write in response to within a set time limit. The SAT and GRE both contain timed essay portions.

Rubric
A rubric is a tool used in writing assessment that can be used in several writing contexts. A rubric consists of a set of criteria or descriptions that guides a rater to score or grade a writer. The origins of rubrics can be traced to early attempts in education to standardize and scale writing in the early 20th century. Ernest C Noyes argues in November 1912 for a shift toward assessment practices that were more science-based. One of the original scales used in education was developed by Milo B. Hillegas in A Scale for the Measurement of Quality in English Composition by Young People. This scale is commonly referred to as the Hillegas Scale. The Hillegas Scale and other scales used in education were used by administrators to compare the progress of schools.

In 1961, Diederich, French, and Carlton from the Educational Testing Service (ETS) publish Factors in Judgments for Writing Ability a rubric compiled from a series of raters whose comments were categorized and condensed into a five-factor rubric:


 * Ideas: relevance, clarity, quantity, development, persuasiveness
 * Form: Organization and analysis
 * Flavor: style, interest, sincerity
 * Mechanics: specific errors in punctuation, grammar, etc.
 * Wording: choice and arrangement of words

As rubrics began to be used in the classroom, teachers began to advocate for criteria to be negotiated with students to have students stake a claim in the how they would be assessed. Scholars such as Chris Gallagher and Eric Turley, Bob Broad, and Asao Inoue (among many) have advocated that effective use of rubrics comes from local, contextual, and negotiated criteria.

Criticisms:

The introduction of the rubric has stirred debate among scholars. Some educators have argued that rubrics rest on false objective claims and thus rest on subjectivity. Eric Turley and Chris Gallagher argued that state-imposed rubrics are a tool for accountability rather than improvements. Many times rubrics originate outside of the classroom from authors with no relation to the students themselves and they are then interpreted and adapted by other educators. Turley and Gallagher note that "the law of distal diminishment says that any educational tool becomes less instructionally useful -- and more potentially damaging to educational integrity -- the further away from the classroom it originates or travels to." They go on to say it is to be interpreted as a tool for writers to measure a set of consensus values, not to be substituted for an engaged response.

A study by Stellmack et al evaluated the perception and application of rubrics with agreed upon criteria. The results found that when different graders evaluated the same draft, the grader who had already given feedback previously was more likely to note improvement. The researchers concluded that a rubric that had higher reliability would result in greater results to their "review-revise-resubmit procedure".

Anti Rubric: Rubrics both measure the quality of writing, and reflect an individual's beliefs of what a department or particular institution’s rhetorical values. But rubrics lack detail on how an instructor may diverge from their these values. Bob Broad notes that an example of an alternative proposal to the rubric is the “dynamic criteria mapping.”

The single standard of assessment raises further questions, as Elbow touches on the social construction of value in itself. He proposes a communal process stripped of the requirement for agreement, would allow the class “see potentialagreements – unforced agreements in their thinking – while helping them articulate where they disagree.” He proposes that grading could take a multidimensional lens where the potential for ‘good writing’ opens. He points out that in doing so, a singular dimensional rubric attempts to assess a multidimensional performance.

Multiple-choice test
Multiple-choice tests contain questions about usage, grammar, and vocabulary. Standardized tests like the SAT, ACT, and GRE are typically used for college or graduate school admission. Other tests, such as Compass and Accuplacer, are typically used to place students into remedial or mainstream writing courses.

Automated essay scoring
Automated essay scoring (AES) is the use of non-human, computer-assisted assessment practices to rate, score, or grade writing tasks.

Race
Some scholars in writing assessment focus their research on the influence of race on the performance on writing assessments. Scholarship in race and writing assessment seek to study how categories of race and perceptions of race continues to shape writing assessment outcomes. However, some scholars in writing assessment recognize that racism in the 21st century is no longer explicit, but argue for a 'silent' racism in writing assessment practices in which racial inequalities in writing assessment are typically justified with non-racial reasons. These scholars advocate for new developments in writing assessment, in which the intersections of race and writing assessment are brought to the forefront of assessment practices.