Teacher Effectiveness

Must value-added models grade teachers on a curve?

Add one more point of critique to New York City’s Teacher Data Reports: experts and educators are worried about the bell curve along which the teacher ratings fell out.

Like the distribution of teachers by rating across types of schools, the distribution of scores among teachers was essentially built into the “value-added” model that the city used to generate the ratings.

The long-term goal of many education reformers is to create a teaching force in which nearly all teachers are high-performing. However, in New York City’s rankings—which rated thousands of teachers who taught in the system from 2007 to 2010—teachers were graded on a curve. That is, under the city’s formula, some teachers would always be rated as “below average,” even if student performance increased significantly in all classrooms across the city.

The ratings were based on a complex formula that predicts how students will do—after taking into account background characteristics—on standardized tests. Teachers received scores based on students’ actual test results measured against the predictions. They were then divided into five categories. Half of all teachers were rated as “average,” 20 percent were “above average,” and another 20 percent were “below average.” The remaining 10 percent were divided evenly between teachers rated as “far above average” and “far below average.”

IMPACT, the District of Columbia’s teacher-evaluation system, also uses a set distribution for teacher ratings. As sociologist Aaron Pallas wrote in October 2010, “by definition, the value-added component of the D.C. IMPACT evaluation system defines 50 percent of all teachers in grades four through eight as ineffective or minimally effective in influencing their students’ learning.”

At a time when the rhetoric around new teacher-evaluation systems has focused on removing ineffective teachers from the classroom, some question whether the ranking structure makes sense. Even William Sanders, a researcher known as the “grandfather of value-added,” is concerned about the bell-shaped curve of the ratings generated by systems in New York City and elsewhere.

“If you just continue that in the future, you will always have a very high group and a very low group,” said Sanders, a former University of Tennessee professor now at the SAS Institute Inc. who’s spent decades developing and refining value-added formulas. “If your population of teachers [is] improving, you basically will not be capturing [that].”

Other researchers, including Doug Harris, say a bell curve is a necessity of value-added models. Harris is a value-added expert at the University of Wisconsin-Madison, where some of his colleagues designed New York City’s formula, though he was not involved in its development.

Harris said the requirement that 5 percent of teachers be rated as “far below average” provided further evidence that value-added scores should not be used alone in making decisions about teacher tenure, dismissal or pay. Instead, he said, being in the bottom 5 percent should trigger things like more classroom observations.

As New York State rolls out its new teacher-evaluation system, value-added models will play an important role. In the new system, 40 percent of a teacher’s rating will be based on student test-scores or similar quantitative measures of student performance—and at least half of that 40 percent will use state standardized test results for some teachers. (Exactly what value-added model the state will use hasn’t yet to be decided.)

Even though other measures (such as observations) make up the majority of each teacher’s evaluation, student performance measures could have a big impact on a teacher’s final rating. “Teachers rated ineffective on student performance based on objective assessments must be rated ineffective overall,” a press release from the New York State Education Department says. “Teachers who remain ineffective can be removed from classrooms.”

Sanders disagrees with Harris that a bell-curve distribution of teacher ratings is necessary. In the Tennessee ranking system that he helped design, there’s no limit as to how many teachers can be placed in each category.

“You can allow for—at least theoretically—everybody to be rated very effective. We think that’s appropriate,” Sanders said. “There is so much emphasis and effort to try to get more and more people to get more and more effective. You don’t want your metrics stuck where you always have a high half and a low half.”

A version of this story appeared on GothamSchools on March 6, 2012.


Sarah Butrymowicz

Sarah Butrymowicz is data editor. Prior to falling in love with spreadsheets and statistics, she spent four years as a staff writer for The Hechinger… See Archive

Letters to the Editor

Send us your thoughts

At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information.

By submitting your name, you grant us permission to publish it with your letter. We will never publish your email. You must fill out all fields to submit a letter.

No letters have been published at this time.