Get important education news and analysis delivered straight to your inbox
In Washington, D.C., officials shortened a new teacher evaluation checklist after complaints from teachers and principals that it was too long and time-consuming.
In Memphis, Tenn., after a year of piloting new evaluations and a summer of training, some principals and teachers remained confused and overwhelmed.
In Louisiana, one expert warned of lawsuits as the state began to roll out a truncated observation system without first testing it.
But in New Haven, Conn., union officials and reformers alike have praised a collaborative effort to help teachers improve under the city’s new rating system.
As New York City officials and union leaders wrangle over the design of new teacher evaluations due to roll out citywide next year, the experiences of other states and districts offer both inspiration and lessons about what not to do.
“We have learned a lot over the last four years about how to do this effectively and well, and the changes we’ve made are reflective of that,” said Scott Thompson, deputy chief of teacher effectiveness in the D.C. Public Schools, which launched a new evaluation system in 2009.
More frequent and rigorous evaluations are part of a new national push to improve the quality of the teaching force. Two-thirds of states are in the process of adopting new evaluations, and many will include student achievement — usually as measured by standardized tests — along with intensive classroom observations. It’s unclear whether the new evaluations will have the desired effect. Even in places with a few years of experience using new systems, there is not enough data to tell for certain if student achievement is improving as a result of the evaluations.
But early adopters say they have at least begun to pinpoint what hasn’t worked, and what teachers and principals find most useful. Washington, D.C.’s experience may be particularly instructive to districts still in the process of designing systems. The city’s evaluation system has been overhauled twice in response to feedback — and problems.
The number of standards on which teachers are measured during a classroom observation was reduced to 18 because teachers found a checklist of 22 indicators too long and confusing. (New York has piloted a checklist that has 22 indicators but has asked schools to focus on just six at first.) The number of categories for teachers — ranging from “ineffective” to “highly effective” — was increased from four to five in an effort to prevent inflation in the ratings. And teachers who have consistently scored well will no longer be observed as frequently as lower performers to save time and lessen anxiety among teachers.
Tennessee also reduced the observation workload because principals felt overwhelmed. “It may seem pretty obvious, but I think anybody started down this road will tell you this is a huge shift in the role of the principal,” said Sara Heyburn, an assistant commissioner in the Tennessee Department of Education. “We had to move quickly to train more people, and we allowed people to combine observations.”
One of the biggest shifts in D.C. was the decision this year to reduce the reliance on test scores in favor of other measures of student achievement that teachers will determine with their principals. Before, value-added measures, which calculate expected student growth on standardized tests, counted for 50 percent of a D.C. teacher’s rating. But value-added measures have been widely criticized as unreliable. Going forward, they will only count for 35 percent of a teacher’s overall evaluation.
“Student performance will continue to be the largest piece of the pie,” said Kaya Henderson, the D.C. Schools Chancellor, in a statement when the change was announced in August. But, she said, “We are evolving that approach to now include multiple measures.”
Most systems combine two main factors in measuring a teacher’s performance: a rating based on at least one formal classroom observation, and a rating meant to capture how much students learn during the year. Previously, most states called for evaluations that relied on a single observation, and tenured teachers were not observed every year.
In New York, value-added measures — for those teachers whose students take standardized tests — will only make up 25 percent of their rating. Another 15 percent will be based on locally selected measures of student achievement, while the remaining 60 percent will depend on more qualitative measures such as classroom observations.
One of the most vexing problems that many education systems have faced is how to measure student growth, or learning, for the vast majority of teachers who don’t teach in tested subjects or grades.
In Florida, the state is simply developing more standardized tests. Last year in Tennessee, teachers without individual value-added scores were rated on their school’s overall performance on standardized tests. Many teachers said this was unfair, however, according to a report by the state education department. So this summer state officials recommended adding more tests, as long they “benefit student performance.”
Other states have left it to districts or schools to create their own “student learning objectives” or SLOs, such as portfolios of artwork or improvement in skills like playing scales on a trumpet. New York will join them when its system takes effect next year.
But a pilot in Rhode Island demonstrated that it’s difficult to ensure that the learning objectives are rigorous. “The quality of our student learning objectives was not where we ultimately want them to be,” said Rhode Island education commissioner Deborah Gist in an interview with The Hechinger Report last year. “There’s no way to make it be entirely objective ever.”
Although hundreds of teachers have lost their jobs due to low ratings as new evaluations have gone into effect, the evaluations haven’t been the shock to the system that many educators expected. In Florida, for example, the percentage of teachers rated poorly only rose by one percentage point in comparison to the old system, which had been criticized as too lenient. In Tennessee, only 2.5 percent of teachers received one of the lowest two ratings (out of five) based on new classroom observations. Three-quarters of teachers fell into the top two categories. And one of the reasons D.C. changed its rating system this year is because the vast majority of teachers continued to be rated as either “effective” or “highly effective.”
“In the end, the anxiety about these systems is largely about the consequences they might carry,” said Timothy Daly, president of TNTP, a nonprofit advocacy group, which in 2009 published a report on teacher effectiveness that helped spur many of the new reforms. “And the truth is that very few teachers are in the position of facing any consequences, which raises the larger question of, ‘Are these ratings accurate?’”
At the same time, a nearly universal piece of advice from education officials in other districts and states is to work closely with teachers when designing the new evaluations. Dozens of teachers in New Haven, Conn., have left because they were rated poorly under the new evaluation system there. But the union was a partner in developing it, and criticism has been muted compared to elsewhere.
“If you create a system that doesn’t have maximum teacher input, it doesn’t matter how technically sound it is,” said Dan Cruce, a former official in the Delaware Department of Education who now works for the nonprofit policy organization Hope Street Group. “It has to be raised and informed by teacher voices, because that’s who it’s designed for.”
The experiences so far with new evaluations suggest that districts should also expect to make changes as they go along. “The idea is that this is going to continuously improve, just like we expect our educators” to do, said Heyburn, of Tennessee. “You can plan for the hypotheticals, but it’s not till feet hit the ground that you learn the real lessons.”
A version of this story appeared on Gotham Schools on December 19, 2012.
At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.
By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.
Thank you, Sarah. This is a comprehensive review of the state of new teacher evaluation systems.
In Illinois, the new teacher evaluations will not take effect until 2016. Not sure if this is good or bad. It gives state officials time to learn from other states but seems that we are wasting valuable time as other states move forward.
Nonetheless, I see a critical factor missing in the conversations about teacher evaluation. Does the system within which a teacher works have a process by which they can monitor the learning of their students throughout the year or is the summative, standardized assessment a “cross your fingers and hope” scenario? I feel that the education system has largely ignored their responsibility to provide such a process. I fully understand the reservations about a system that makes teachers solely responsible for student learning and ignores the impact of district and school leadership in setting the stage for this learning.
New York seems to have the right idea by including a local measure of achievement in the evaluation picture. Still it is unclear whether teachers are given formative data in a structured process throughout the year. My guess is that in places where a structured process exists, teacher evaluation is an easier task since student achievement is linked to specific teacher actions throughout the year.
Outside of politics and ideology, there is no reason for any significant portion of teacher evaluation to rely on “value-added” calculations based on student test scores. As the National Research Council’s Board on Testing and Assessment concluded just last year, “VAM estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.”
For more evidence on “Why Teacher Evaluation Shouldn’t Rest on Student Test Scores” see the new fact sheet:
Mr. Schaeffer is certainly correct. What we need is a much deeper understanding of assessment, first of students, then of teachers, then of principals — and I doubt it should stop there. Having that, we would find, I believe, that our educational thought leaders have been making some trenchant criticisms of the existing state of American education, but that their solutions have been, in general, uneducated guesses. In particular, it is hard to see why none of the opinion-makers whose “expertise” is cited in this article seems to be aware of even the notion of checking the experience of education systems outside of our borders — and so we merely repeat mistakes that others have already learned from.
Submit a letter