One of the most controversial and largest education reforms of the Obama era was an effort to improve teacher quality. As part of a $4 billion initiative called Race to the Top, the federal government tried to persuade states to measure the performance of teachers, rank them, pay bonuses to the best teachers and fire the worst ones. Most states bent to the federal pressure (and money), adopting various versions of teacher evaluation and accountability systems, and spending billions more of their own taxpayer money in the implementation.
Teachers and their unions bristled at being judged by students’ test scores. Educators were implicated in cheating scandals in Atlanta and other cities around the nation.
Yet there were also striking success stories from rating teachers. Research documented giant student achievement improvements in Washington, D.C., Chicago and Cincinnati, which persist to this day. But another study found that students didn’t fare any better after similar reforms in Tampa, Memphis or Pittsburgh.*
The patchwork of contradictory research was confusing. Researchers at four universities collaborated to make sense of these teacher effectiveness reforms for the nation as a whole to find out if rating teachers generally leads to better outcomes for students. They looked at student achievement in math and reading as well as high school graduation rates and college enrollment in 44 states that adopted teacher evaluation systems by 2017.
The answer: In most states, teacher evaluation systems didn’t do anything for students. In a half-dozen states, it worked very well. In another half dozen, student performance fell. The successes and failures offset each other. Overall, it’s a wash.
“Throughout the whole country, we didn’t get a return on investment,” said Joshua Bleiberg, one of the researchers involved in the study at the Annenberg Institute for School Reform at Brown University. “It’s important to emphasize that this is an average. There are some places where it did work well.”
The study, The Effect of Teacher Evaluation on Achievement and Attainment: Evidence from Statewide Reforms, was posted on the Annenberg website in December 2021. It is a working paper, which has not yet been peer-reviewed.
Researchers were curious to understand why some teacher evaluation programs were more successful than others. The exact design of the program didn’t seem to make a difference. Some states relied on teacher observations for rating teachers. Others also factored in student test scores and surveys of parents and students. In some states, teachers were given extra training and time to improve their craft before being let go. Bonuses for good teachers varied in size and in the share of teachers who got them. Some used the ratings to decide who gets tenure. Some didn’t. Yet, researchers saw each of these permutations succeed, fail and produce nothing.
What seems to matter the most is location. Researchers noticed that large cities surrounded by a ring of suburbs had the best experience. Those suburbs provided a ready supply of teachers who could replace the low-performing teachers.
“It only works well if there is a large supply of teachers to hire,” said Bleiberg.
For the same reason, school districts with a high proportion of charter schools fared better too. Charter school staff provided a second pool of teachers for the public schools to hire away.
Meanwhile, large school districts that span an entire county and are surrounded by rural areas fared less well. There simply weren’t enough other teachers nearby to replace the weak ones. Some places made efforts to recruit out-of-state teachers but it’s hard to persuade highly qualified teachers with years of experience to relocate.
“The reformers are right. And wrong,” said Bleiberg. “They’re right in the sense that this can work, but they’re wrong in the sense that it’s going to be very difficult to make it work everywhere.”
There continues to be a lot of enthusiasm for teacher evaluation systems among education reformers. And because of the big successes in Washington, D.C., and Chicago, it may be difficult to absorb that, on average, it doesn’t work. It’s another good example of why it’s so hard to expand good ideas in education. Not everything can scale.
*Correction: This paragraph was modified from the original to delete a sentence that refers to teacher quality in Dallas, where research is ongoing.
This story about evaluating teachers was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.