Value-added modeling

Value-added modeling (also known as value-added measurement, value-added analysis and value-added assessment) is a method of teacher evaluation that measures the teacher's contribution in a given year by comparing the current test scores of their students to the scores of those same students in previous school years, as well as to the scores of other students in the same grade. In this manner, value-added modeling seeks to isolate the contribution, or value added, that each teacher provides in a given year, which can be compared to the performance measures of other teachers. VAMs are considered to be fairer than simply comparing student achievement scores or gain scores without considering potentially confounding context variables like past performance or income. It is also possible to use this approach to estimate the value added by the school principal or the school as a whole.

Critics say that the use of tests to evaluate individual teachers has not been scientifically validated, and much of the results are due to chance or conditions beyond the teacher's control, such as outside tutoring. Research shows, however, that differences in teacher effectiveness as measured by value-added of teachers are associated with small economic effects on students.

Method
Researchers use statistical processes on a student's past test scores to predict the student's future test scores, on the assumption that students usually score approximately as well each year as they have in past years. The student's actual score is then compared to the predicted score. The difference between the predicted and actual scores, if any, is assumed to be due to the teacher and the school, rather than to the student's natural ability or socioeconomic circumstances.

In this way, value-added modeling attempts to isolate the teacher's contributions from factors outside the teacher's control that are known to strongly affect student test performance, including the student's general intelligence, poverty, and parental involvement.

By aggregating all of these individual results, statisticians can determine how much a particular teacher improves student achievement, compared to how much the typical teacher would have improved student achievement.

Statisticians use hierarchical linear modeling to predict the score for a given student in a given classroom in a given school. This prediction is based on aggregated results of all students. Each student's predicted score may take into account student level (e.g., past performance, socioeconomic status, race/ethnicity), teacher level (e.g., certification, years of experience, highest degree earned, teaching practices, instructional materials, curriculum) and school level (e.g., size, type, setting) variables into consideration. Which variables are included depends on the model.

Uses
, a few school districts across the United States had adopted the system, including the Chicago Public Schools, New York City Department of Education and District of Columbia Public Schools. The rankings have been used to decide on issues of teacher retention and the awarding of bonuses, as well as a tool for identifying those teachers who would benefit most from teacher training. Under Race to the Top and other programs advocating for better methods of evaluating teacher performance, districts have looked to value-added modeling as a supplement to observing teachers in classrooms.

Louisiana legislator Frank A. Hoffmann introduced a bill to authorize the use of value-added modeling techniques in the state's public schools as a means to reward strong teachers and to identify successful pedagogical methods, as well as providing a means to provide additional professional development for those teachers identified as weaker than others. Despite opposition from the Louisiana Federation of Teachers, the bill passed the Louisiana State Senate on May 26, 2010, and was immediately signed into law by Governor Bobby Jindal.

Experts do not recommend using value-added modeling as the sole determinant of any decision. Instead, they recommend using it as a significant factor in a multifaceted evaluation program.

Limitations
As a norm-referenced evaluation system, the teacher's performance is compared to the results seen in other teachers in the chosen comparison group. It is therefore possible to use this model to infer that a teacher is better, worse, or the same as the typical teacher, but it is not possible to use this model to determine whether a given level of performance is desirable.

Because each student's expected score is largely derived from the student's actual scores in previous years, it difficult to use this model to evaluate teachers of kindergarten and first grade. Some research limits the model to teachers of third grade and above.

Schools may not be able to obtain new students' prior scores from the students' former schools, or the scores may not be useful because of the non-comparability of some tests. A school with high levels of student turnover may have difficulty in collecting sufficient data to apply this model. When students change schools in the middle of the year, their progress during the year is not solely attributable to their final teachers.

Value-added scores are more sensitive to teacher effects for mathematics than for language. This may be due to widespread use of poorly constructed tests for reading and language skills, or it may be because teachers ultimately have less influence over language development. Students learn language skills from many sources, especially their families, while they learn math skills primarily in school.

There is some variation in scores from year to year and from class to class. This variation is similar to performance measures in other fields, such as Major League Baseball and thus may reflect real, natural variations in the teacher's performance. Because of this variation, scores are most accurate if they are derived from a large number of students (typically 50 or more). As a result, it is difficult to use this model to evaluate first-year teachers, especially in elementary school, as they may have only taught 20 students. A ranking based on a single classroom is likely to classify the teacher correctly about 65% of the time. This number rises to 88% if ten years' data are available. Additionally, because the confidence interval is wide, the method is most reliable when identifying teachers who are consistently in the top or bottom 10%, rather than trying to draw fine distinctions between teachers that produce more or less typical achievements, such as attempting to determine whether a teacher should be rated as being slightly above or slightly below the median.

Value added scores assume that students are randomly assigned to teachers. In reality students are rarely randomly assigned to teachers or to schools. According to economist and professor, Dr. Jesse M. Rothstein of University of California, Berkeley, "Non-random assignment of students to teachers can bias value added estimates of teachers' causal effects." The issue of possible bias with the use of value added measures has been the subject of considerable recent study, and other researchers reach the conclusion that value added measures do provide good estimates of teacher effectiveness. See, for example, the recent work of the Measures of Effective Teaching project and the analysis of how value added measures relate to future incomes by Professor Raj Chetty of Harvard and his colleagues.

Research
The idea of judging the effectiveness of teachers based on the learning gains of students was first introduced into the research literature in 1971 by Eric Hanushek, currently a Senior Fellow at the conservative  Hoover Institution, an American public policy think tank located at Stanford University in California. It was subsequently analyzed by Richard Murnane of Harvard University among others. The approach has been used in a variety of different analyses to assess the variation in teacher effectiveness within schools, and the estimation has shown large and consistent differences among teachers in the learning pace of their students.

Statistician William Sanders, a senior research manager at SAS introduced the concept to school operations when he developed value-added models for school districts in North Carolina and Tennessee. First created as a teacher evaluation tool for school programs in Tennessee in the 1990s, the use of the technique expanded with the passage of the No Child Left Behind legislation in 2002. Based on his experience and research, Sanders argued that "if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers."

A 2003 study by the RAND Corporation prepared for the Carnegie Corporation of New York, said that value-added modeling "holds out the promise of separating the effects of teachers and schools from the powerful effects of such noneducational factors as family background" and that studies had shown that there was a wide variance in teacher scores when using such models, which could make value-added modeling an effective tool for evaluating and rewarding teacher performance if the variability could be substantiated as linked to the performance of individual teachers.

The Los Angeles Times reported on the use of the program in that city's schools, creating a searchable web site that provided the score calculated by the value-added modeling system for 6,000 elementary school teachers in the district. United States Secretary of Education Arne Duncan praised the newspaper's reporting on the teacher scores citing it as a model of increased transparency, though he noted that greater openness must be balanced against concerns regarding "privacy, fairness and respect for teachers". In February, 2011, Derek Briggs and Ben Domingue of the National Education Policy Center (NEPC) released a report reanalyzing the same dataset from the L.A. Unified School District, attempting to replicate the results published in the Times, and they found serious limitations of the previous research, concluding that the "research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings."

The Bill and Melinda Gates Foundation is sponsoring a multi-year study of value-added modeling with their Measures of Effective Teaching program. Initial results, released in December 2010, indicate that both value-added modeling and student perception of several key teacher traits, such as control of the classroom and challenging students with rigorous work, correctly identify effective teachers. The study about student evaluations was done by Ronald Ferguson. The study also discovered that teachers who teach to the test are much less effective, and have significantly lower value-added modeling scores, than teachers who promote a deep conceptual understanding of the full curriculum. Reanalysis of the MET report's results conducted by Jesse Rothstein, an economist and professor at University of California, Berkeley, dispute some of these interpretations, however. Rothstein argues that the analyses in the report do not support the conclusions, and that "interpreted correctly... [they] undermine rather than validate value-added-based approaches to teacher evaluation." More recent work from the MET project, however, validates the use of value added approaches.

Principals and leaders
The general idea of value added modeling has also been extended to consider principals and school leaders. While there has been considerable anecdotal discussion about the importance of school leaders, there has been very little systematic research into the impact of them on student outcomes. Recent analysis in Texas has provided evidence about the effectiveness of leaders by looking at how the gains in student achievement for a school change after the principal changes. This outcome-based approach to measuring effectiveness of principals is very similar to the value-added modeling that has been applied to the evaluation of teachers. The early research in Texas finds that principals have a very large impact on student achievement. Conservative estimates indicate that an effective school leader improves the performance of all students in a school, with the magnitude equal on average to two months additional learning gains for the students in each school year. These gains come at least in part through the principal's impact on selecting and retaining good teachers. Ineffective principals, however, have a similarly large negative effect on school performance, suggesting that issues of evaluation are as important with respect to school leadership as they are for teachers.

Criticism and concerns
A report issued by the Economic Policy Institute in August 2010 recognized that "American public schools generally do a poor job of systematically developing and evaluating teachers" but expressed concern that using performance on standardized tests as a measuring tool will not lead to better performance. The EPI report recommends that measures of performance based on standardized test scores be one factor among many that should be considered to "provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning." The study called value-added modeling a fairer means of comparing teachers that allows for better measures of educational methodologies and overall school performance, but argued that student test scores were not sufficiently reliable as a means of making "high-stakes personnel decisions".

Edward Haertel, who led the Economic Policy Institute research team, wrote that the methodologies being pushed as part of the Race to the Top program placed "too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals" and that the techniques of valued-added modeling need to be more thoroughly evaluated and should only be used "in closely studied pilot projects".

Education policy researcher Gerald Bracey further argued it is possible that a correlation between teachers and short-term changes in test scores may be irrelevant to the actual quality of teaching. Therefore, "it cannot permit causal inferences about individual teachers. At best, it is a beginning step to identify teachers who might need additional professional development."

The American Statistical Association issued an April 8, 2014 statement criticizing the use of value-added models in educational assessment, without ruling out the usefulness of such models. The ASA cited limitations of input data, the influence of factors not included in the models, and large standard errors resulting in unstable year-to-year rankings.

John Ewing, writing in the Notices of the American Mathematical Society criticized the use of value-added models in educational assessment as a form of "mathematical intimidation" and a "rhetorical weapon." Ewing cited problems with input data and the influence of factors not included in the model.

Alternatives
Several alternatives for teacher evaluation have been implemented:


 * Evaluation by students: If asked validated questions, students as young as fourth graders can accurately identify effective teachers.  Course evaluations are common in universities, but rarely count for more than a trivial fraction in a decision to retain or fire a teacher.
 * Activities outside the classroom: Part of a teacher's evaluation typically includes participation in staff training events.  For example, a teacher who completes a master's degree is almost always paid more, even though holding a master's degree has no effect on student achievement.

Most experts recommend using multiple measures to evaluate teacher effectiveness.