Get important education news and analysis delivered straight to your inbox
Good teachers matter and—as in every other profession—some are better than others. Researchers have even found that the very best teachers can help students overcome many of the effects of poverty and catch up to or surpass their more privileged peers.
That’s why there is intense interest now in finding better ways to judge the relative effectiveness of teachers. But how should that be done? Most teacher evaluations not only fail to single out successful teachers—they also don’t help principals determine which teachers need help to improve and which ones are failing their students altogether. Instead, all teachers end up being judged the same, which is to say, satisfactory.
“It’s universally acknowledged—teacher evaluations are broken,” said Timothy Daly, president of The New Teacher Project, a group that helps school districts recruit and train teachers.
Perhaps surprisingly, teacher-union leaders agree. Michael Mulgrew, president of New York City’s United Federation of Teachers (UFT), said last spring that “the current evaluation system doesn’t work for teachers—it’s too subjective, lacks specific criteria and is too dependent on the whims and prejudices of principals.”
So, it would seem that a system using student test scores to calculate how much “value” teachers add to their students’ learning would be fairer. Indeed, Mulgrew endorsed New York state’s new evaluation system, in which student achievement counts for 40 percent of a teacher’s rating.
“Value-added” measurements use complex statistical models to project a student’s future gains based on his or her past performance, taking into account how similar students perform. The idea is that good teachers add value by helping students progress further than expected, and bad teachers subtract value by slowing their students down.
Using value-added models to calculate teacher effectiveness wasn’t possible on a wide scale until recently. In the 1990s, William L. Sanders, a statistician at the University of Tennessee, pioneered the technique with student test scores—and managed to persuade the Hamilton County School Board to work with him in taking a closer look at the results.
The method—as Sanders puts it—is like measuring a child’s height on a wall. It tracks a child’s academic growth over the year, no matter how far ahead or behind the child was initially. Sanders discovered that teacher quality varied greatly in every school, and he and others also found that students assigned to good teachers for three consecutive years tended to make great strides, while those assigned to three poor ones in a row usually fell way behind.
Why value-added is hot now
Hundreds of districts, including Chicago, Denver, New York City and Washington, D.C., are using such methods as a way to strengthen their teacher evaluations by factoring in student performance. The biggest push for the use of such methods has come from the Obama administration, which insisted that states competing for grants under its $4.3 billion Race to the Top program find ways to link student performance to teacher evaluations. The dozen winners of those grants are now struggling to figure out how to do just that.
At the same time, however, value-added modeling is the focus of furious debate among scholars, policymakers, superintendents, education advocates and journalists. The latest flare-up is occurring this week in New York City. The New York Times, Wall Street Journal, New York Daily News and others are seeking the value-added rankings of about 12,000 teachers in grades 4 through 8 whose students took state English and math tests.
The New York City Department of Education says it’s willing to make those scores public. But the UFT is suing to block their release. Value-added ratings, the union says in its lawsuit, are “unreliable, often incorrect, subjective analyses dressed up as scientific fact.” The union calls the calculations a “complex and largely subjective guessing game.”
In August, the Los Angeles Times was the subject of intense criticism and praise for its series that included value-added scores for individual teachers based on years of standardized test data—a project that newspapers in New York City now want to replicate. (Disclosure: The Los Angeles Times data-analysis was supported in part by a grant from The Hechinger Report.)
The documentary Waiting for “Superman”, directed by Davis Guggenheim, also has thrust the teacher-evaluation issue into the national spotlight, highlighting as it does the historical disconnect between teacher job-security and student performance.
Limitations of value-added
Value-added models aren’t perfect, as even their most ardent supporters concede. Oft-cited shortcomings range from doubts about fairness to broader concerns centered on teaching goals. “When people talk about their experience with a really good teacher, they’re not talking about test scores,” said Aaron Pallas, professor of sociology and education at Teachers College, Columbia University. “They’re talking about a teacher who gave them self-confidence, the ability to learn, an interest and curiosity about certain subjects.”
Critics point out that value-added data are only as good as the standardized tests—and test quality varies greatly from state to state. There are also many ways to calculate value-added scores, and different statistical techniques yield different results. The calculations may take into account factors that can affect achievement, such as class size, a school’s funding level and student demographics. Whether to include the race and poverty-status of students when measuring teachers is particularly contentious, writes Douglas N. Harris, an economist at the University of Wisconsin, Madison, in a report on value-added models released this week.
Whatever the computational method, a teacher’s score can vary significantly from one year to the next—results that could affect a teacher’s reputation and salary in places that are considering linking teacher pay to performance. And while value-added models may do a decent job of highlighting the best and worst performers, they’re widely considered unreliable in differentiating the good from the mediocre (or the mediocre from the terrible).
For this reason, many want value-added calculations only to be used in assessing schools and curricula—not individual teachers. But more and more, value-added data are playing a role in personnel decisions about bonuses, tenure and dismissal. In July, D.C. Schools Chancellor Michelle Rhee made waves by firing 165 teachers for poor evaluations, half of which depended on value-added data. With Adrian Fenty’s loss in the Democratic primary for mayor of D.C. last month, Rhee has announced her resignation—but her teacher-evaluation system, IMPACT, will remain in place.
What lies ahead
Even those who champion value-added measures caution against using them as the sole means of evaluating teachers. Kate Walsh, president of the National Council on Teacher Quality, a research and advocacy group in Washington, D.C., has called value-added the best teacher-evaluation method so far. But she also says it would be a “huge mistake” to rely on it alone, or even primarily.
Randi Weingarten, president of the American Federation of Teachers, does not oppose the use of value-added data but wants to ensure evaluations are based on “classroom observations, self-evaluations, portfolios, appraisal of lesson plans, students’ written work” as well.
The best uses of value-added data may well be in the future. If educators could use the data to figure out what the most effective teachers are doing right and share that with colleagues, it would be a great boon. But while major foundation money is being spent to try and do just that, it is very difficult, especially given that great teachers often don’t know themselves what they’re doing right.
Whatever the future uses of value-added measures, the idea of holding teachers accountable for student performance seems here to stay.
“It’s a valuable part of the conversation,” said Daly, of The New Teacher Project. “It puts what matters most—student achievement—front and center as the most important responsibility for a teacher.”
Sarah Garland and Richard Lee Colvin contributed to this article.
At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.
By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.
This is a highly unbalanced article; with the only people arguing against value-added evaluations representatives from the teachers union. In fact, most experts in testing and statistics will tell you that this is a highly unreliable way to measure teacher effectiveness.
Sorry, I take this back. Another edited version of this article posted in articles all over the country had none of the criticism from experts on this issue; while this one included it. I wish those other articles included the full flavor of the debate on the large statistical uncertainties involved.
Thanks for the link to Doug Harris. Its a reminder of how many social science principles that VAMS break, and an new insight into their flawed math. They also violate the fndamental principles of our legal system. My favorite recent defense of VAMs admit that a knowledgable lawyer with expert witnesses would defeat terminations that used VAMs, but your “joe blow lawyer” could not.
Does that show more disrespect for the teaching profession or the law?
How could “reformers” support the expansion of VAMs and NCLB-type testing into high school where they haven’t been tested. When that try they will find that inner city high school
students bury loved ones every year. The number of funerals varies widely. How do they intend to account for that?
Or do they not care? Are “reformers” hired guns to attack an enemy, not help poor kids.
And that’s why VAMs won’t have staying power, they depend on the ignorance and the lack of curiousity of “reformers” about social science, the law, and educational reality. But reality is more than just a paradigm and reality is more than just a statistical black box.
The reality is VAMs are just a club. They will be used first on principals, who usually don’t have strong unions and that won’t advance the anti-union cause and they soon swap them for another weapon. Then we’ll fight over those threats to teaching, learning, and the American way.
Leonie, I am sorry but your comment is simply inaccurate. There are lots of people who think value-added models are not particularly helpful. The data are just unreliable at the classroom level and teachers don’t have much impact on a student in a single year. They’re basically played the cards they’re dealt. There is enough unreliability in the assessments to make difference scores unreliable.
It’s a complicated psychometric issue but I don’t think many experts have much faith in the teacher-level estimates of performance. More than that, student performance data just isn’t, by itself, informative for a teacher looking to improve their students’ performances.
The AFT is right – performance should include lots of factors.
Submit a letter