Get important education news and analysis delivered straight to your inbox
Good teachers matter and—as in every other profession—some are better than others. Researchers have even found that the very best teachers can help students overcome many of the effects of poverty and catch up to or surpass their more privileged peers.
That’s why there is intense interest now in finding better ways to judge the relative effectiveness of teachers. But how should that be done? Most teacher evaluations not only fail to single out successful teachers—they also don’t help principals determine which teachers need help to improve and which ones are failing their students altogether. Instead, all teachers end up being judged the same, which is to say, satisfactory.
Perhaps surprisingly, teacher-union leaders agree. Michael Mulgrew, president of New York City’s United Federation of Teachers (UFT), said last spring that “the current evaluation system doesn’t work for teachers—it’s too subjective, lacks specific criteria and is too dependent on the whims and prejudices of principals.”
So, it would seem that a system using student test scores to calculate how much “value” teachers add to their students’ learning would be fairer. Indeed, Mulgrew endorsed New York state’s new evaluation system, in which student achievement counts for 40 percent of a teacher’s rating.
“Value-added” measurements use complex statistical models to project a student’s future gains based on his or her past performance, taking into account how similar students perform. The idea is that good teachers add value by helping students progress further than expected, and bad teachers subtract value by slowing their students down.
Using value-added models to calculate teacher effectiveness wasn’t possible on a wide scale until recently. In the 1990s, William L. Sanders, a statistician at the University of Tennessee, pioneered the technique with student test scores—and managed to persuade the Hamilton County School Board to work with him in taking a closer look at the results.
The method—as Sanders puts it—is like measuring a child’s height on a wall. It tracks a child’s academic growth over the year, no matter how far ahead or behind the child was initially. Sanders discovered that teacher quality varied greatly in every school, and he and others also found that students assigned to good teachers for three consecutive years tended to make great strides, while those assigned to three poor ones in a row usually fell way behind.
Why value-added is hot now
Hundreds of districts, including Chicago, Denver, New York City and Washington, D.C., are using such methods as a way to strengthen their teacher evaluations by factoring in student performance. The biggest push for the use of such methods has come from the Obama administration, which insisted that states competing for grants under its $4.3 billion Race to the Top program find ways to link student performance to teacher evaluations. The dozen winners of those grants are now struggling to figure out how to do just that.
At the same time, however, value-added modeling is the focus of furious debate among scholars, policymakers, superintendents, education advocates and journalists. The latest flare-up is occurring this week in New York City. The New York Times, Wall Street Journal, New York Daily News and others are seeking the value-added rankings of about 12,000 teachers in grades 4 through 8 whose students took state English and math tests.
The New York City Department of Education says it’s willing to make those scores public. But the UFT is suing to block their release. Value-added ratings, the union says in its lawsuit, are “unreliable, often incorrect, subjective analyses dressed up as scientific fact.” The union calls the calculations a “complex and largely subjective guessing game.”
In August, the Los Angeles Times was the subject of intense criticism and praise for its series that included value-added scores for individual teachers based on years of standardized test data—a project that newspapers in New York City now want to replicate. (Disclosure: The Los Angeles Times data-analysis was supported in part by a grant from The Hechinger Report.)
The documentary Waiting for “Superman”, directed by Davis Guggenheim, also has thrust the teacher-evaluation issue into the national spotlight, highlighting as it does the historical disconnect between teacher job-security and student performance.
Limitations of value-added
Value-added models aren’t perfect, as even their most ardent supporters concede. Oft-cited shortcomings range from doubts about fairness to broader concerns centered on teaching goals. “When people talk about their experience with a really good teacher, they’re not talking about test scores,” said Aaron Pallas, professor of sociology and education at Teachers College, Columbia University. “They’re talking about a teacher who gave them self-confidence, the ability to learn, an interest and curiosity about certain subjects.”
Critics point out that value-added data are only as good as the standardized tests—and test quality varies greatly from state to state. There are also many ways to calculate value-added scores, and different statistical techniques yield different results. The calculations may take into account factors that can affect achievement, such as class size, a school’s funding level and student demographics. Whether to include the race and poverty-status of students when measuring teachers is particularly contentious, writes Douglas N. Harris, an economist at the University of Wisconsin, Madison, in a report on value-added models released this week.
Whatever the computational method, a teacher’s score can vary significantly from one year to the next—results that could affect a teacher’s reputation and salary in places that are considering linking teacher pay to performance. And while value-added models may do a decent job of highlighting the best and worst performers, they’re widely considered unreliable in differentiating the good from the mediocre (or the mediocre from the terrible).
For this reason, many want value-added calculations only to be used in assessing schools and curricula—not individual teachers. But more and more, value-added data are playing a role in personnel decisions about bonuses, tenure and dismissal. In July, D.C. Schools Chancellor Michelle Rhee made waves by firing 165 teachers for poor evaluations, half of which depended on value-added data. With Adrian Fenty’s loss in the Democratic primary for mayor of D.C. last month, Rhee has announced her resignation—but her teacher-evaluation system, IMPACT, will remain in place.
What lies ahead
Even those who champion value-added measures caution against using them as the sole means of evaluating teachers. Kate Walsh, president of the National Council on Teacher Quality, a research and advocacy group in Washington, D.C., has called value-added the best teacher-evaluation method so far. But she also says it would be a “huge mistake” to rely on it alone, or even primarily.
Randi Weingarten, president of the American Federation of Teachers, does not oppose the use of value-added data but wants to ensure evaluations are based on “classroom observations, self-evaluations, portfolios, appraisal of lesson plans, students’ written work” as well.
The best uses of value-added data may well be in the future. If educators could use the data to figure out what the most effective teachers are doing right and share that with colleagues, it would be a great boon. But while major foundation money is being spent to try and do just that, it is very difficult, especially given that great teachers often don’t know themselves what they’re doing right.
Whatever the future uses of value-added measures, the idea of holding teachers accountable for student performance seems here to stay.
“It’s a valuable part of the conversation,” said Daly, of The New Teacher Project. “It puts what matters most—student achievement—front and center as the most important responsibility for a teacher.”
Sarah Garland and Richard Lee Colvin contributed to this article.