Get important education news and analysis delivered straight to your inbox
One of the most controversial and largest education reforms of the Obama era was an effort to improve teacher quality. As part of a $4 billion initiative called Race to the Top, the federal government tried to persuade states to measure the performance of teachers, rank them, pay bonuses to the best teachers and fire the worst ones. Most states bent to the federal pressure (and money), adopting various versions of teacher evaluation and accountability systems, and spending billions more of their own taxpayer money in the implementation.
Teachers and their unions bristled at being judged by students’ test scores. Educators were implicated in cheating scandals in Atlanta and other cities around the nation.
Yet there were also striking success stories from rating teachers. Research documented giant student achievement improvements in Washington, D.C., Chicago and Cincinnati, which persist to this day. But another study found that students didn’t fare any better after similar reforms in Tampa, Memphis or Pittsburgh.*
The patchwork of contradictory research was confusing. Researchers at four universities collaborated to make sense of these teacher effectiveness reforms for the nation as a whole to find out if rating teachers generally leads to better outcomes for students. They looked at student achievement in math and reading as well as high school graduation rates and college enrollment in 44 states that adopted teacher evaluation systems by 2017.
The answer: In most states, teacher evaluation systems didn’t do anything for students. In a half-dozen states, it worked very well. In another half dozen, student performance fell. The successes and failures offset each other. Overall, it’s a wash.
“Throughout the whole country, we didn’t get a return on investment,” said Joshua Bleiberg, one of the researchers involved in the study at the Annenberg Institute for School Reform at Brown University. “It’s important to emphasize that this is an average. There are some places where it did work well.”
The study, The Effect of Teacher Evaluation on Achievement and Attainment: Evidence from Statewide Reforms, was posted on the Annenberg website in December 2021. It is a working paper, which has not yet been peer-reviewed.
Researchers were curious to understand why some teacher evaluation programs were more successful than others. The exact design of the program didn’t seem to make a difference. Some states relied on teacher observations for rating teachers. Others also factored in student test scores and surveys of parents and students. In some states, teachers were given extra training and time to improve their craft before being let go. Bonuses for good teachers varied in size and in the share of teachers who got them. Some used the ratings to decide who gets tenure. Some didn’t. Yet, researchers saw each of these permutations succeed, fail and produce nothing.
What seems to matter the most is location. Researchers noticed that large cities surrounded by a ring of suburbs had the best experience. Those suburbs provided a ready supply of teachers who could replace the low-performing teachers.
“It only works well if there is a large supply of teachers to hire,” said Bleiberg.
For the same reason, school districts with a high proportion of charter schools fared better too. Charter school staff provided a second pool of teachers for the public schools to hire away.
Meanwhile, large school districts that span an entire county and are surrounded by rural areas fared less well. There simply weren’t enough other teachers nearby to replace the weak ones. Some places made efforts to recruit out-of-state teachers but it’s hard to persuade highly qualified teachers with years of experience to relocate.
“The reformers are right. And wrong,” said Bleiberg. “They’re right in the sense that this can work, but they’re wrong in the sense that it’s going to be very difficult to make it work everywhere.”
There continues to be a lot of enthusiasm for teacher evaluation systems among education reformers. And because of the big successes in Washington, D.C., and Chicago, it may be difficult to absorb that, on average, it doesn’t work. It’s another good example of why it’s so hard to expand good ideas in education. Not everything can scale.
*Correction: This paragraph was modified from the original to delete a sentence that refers to teacher quality in Dallas, where research is ongoing.
This story about evaluating teachers was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.
At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.
By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.
The blog “PROOF POINTS: Nationwide, evaluating and penalizing teachers rarely works” is interesting, but it fails to consider how strong professional development and a growth mindset can enable teachers to improve over time. My experience is that there are some teachers in their first year of teaching that seem like failures, but by their fourth or fifth years are top quality. The reverse is that some teachers that start off strong actually get worse and stop growing. My point is that the key to successful teaching is to acknowledge that it is an improvement profession, and requires high quality staff development, mentoring, good feedback, and a growth mentality to become a very strong teacher. Good teaching requires a lifetime of learning!
Thanks to Jill Barshay for another thoughtful report on new education research.
The study Barshay write about seems to conclude that teacher evaluation reform has been unsuccessful in many states and school districts because there weren’t sufficient numbers of stronger teachers to replace those removed via more rigorous evaluations. But few teachers have been fired under the new evaluation systems, except in the District of Columbia, Tennessee and a few other places. So in most of the country, there haven’t been any evaluation-related vacancies to fill.
More meaningful teacher evaluations can help even when they’re designed only to help teachers improve their practice. The problem with many of the new evaluation systems is that they weren’t implemented effectively; many of them are barely more rigorous/helpful to teachers than the superficial systems they replaced.
So I think the Brown researchers draw the wrong conclusion from their findings. My experience is that a central reason proven reforms often don’t scale in public education is because they’re not implemented with fidelity in new places.
Submit a letter