The Hechinger Report is a national nonprofit newsroom that reports on one topic: education. Sign up for our weekly newsletters to get stories like this delivered directly to your inbox.

Add one more point of critique to New York City’s Teacher Data Reports: experts and educators are worried about the bell curve along which the teacher ratings fell out.

Like the distribution of teachers by rating across types of schools, the distribution of scores among teachers was essentially built into the “value-added” model that the city used to generate the ratings.

The long-term goal of many education reformers is to create a teaching force in which nearly all teachers are high-performing. However, in New York City’s rankings—which rated thousands of teachers who taught in the system from 2007 to 2010—teachers were graded on a curve. That is, under the city’s formula, some teachers would always be rated as “below average,” even if student performance increased significantly in all classrooms across the city.

The ratings were based on a complex formula that predicts how students will do—after taking into account background characteristics—on standardized tests. Teachers received scores based on students’ actual test results measured against the predictions. They were then divided into five categories. Half of all teachers were rated as “average,” 20 percent were “above average,” and another 20 percent were “below average.” The remaining 10 percent were divided evenly between teachers rated as “far above average” and “far below average.”

IMPACT, the District of Columbia’s teacher-evaluation system, also uses a set distribution for teacher ratings. As sociologist Aaron Pallas wrote in October 2010, “by definition, the value-added component of the D.C. IMPACT evaluation system defines 50 percent of all teachers in grades four through eight as ineffective or minimally effective in influencing their students’ learning.”

At a time when the rhetoric around new teacher-evaluation systems has focused on removing ineffective teachers from the classroom, some question whether the ranking structure makes sense. Even William Sanders, a researcher known as the “grandfather of value-added,” is concerned about the bell-shaped curve of the ratings generated by systems in New York City and elsewhere.

“If you just continue that in the future, you will always have a very high group and a very low group,” said Sanders, a former University of Tennessee professor now at the SAS Institute Inc. who’s spent decades developing and refining value-added formulas. “If your population of teachers [is] improving, you basically will not be capturing [that].”

Other researchers, including Doug Harris, say a bell curve is a necessity of value-added models. Harris is a value-added expert at the University of Wisconsin-Madison, where some of his colleagues designed New York City’s formula, though he was not involved in its development.

Harris said the requirement that 5 percent of teachers be rated as “far below average” provided further evidence that value-added scores should not be used alone in making decisions about teacher tenure, dismissal or pay. Instead, he said, being in the bottom 5 percent should trigger things like more classroom observations.

As New York State rolls out its new teacher-evaluation system, value-added models will play an important role. In the new system, 40 percent of a teacher’s rating will be based on student test-scores or similar quantitative measures of student performance—and at least half of that 40 percent will use state standardized test results for some teachers. (Exactly what value-added model the state will use hasn’t yet to be decided.)

Even though other measures (such as observations) make up the majority of each teacher’s evaluation, student performance measures could have a big impact on a teacher’s final rating. “Teachers rated ineffective on student performance based on objective assessments must be rated ineffective overall,” a press release from the New York State Education Department says. “Teachers who remain ineffective can be removed from classrooms.”

Sanders disagrees with Harris that a bell-curve distribution of teacher ratings is necessary. In the Tennessee ranking system that he helped design, there’s no limit as to how many teachers can be placed in each category.

“You can allow for—at least theoretically—everybody to be rated very effective. We think that’s appropriate,” Sanders said. “There is so much emphasis and effort to try to get more and more people to get more and more effective. You don’t want your metrics stuck where you always have a high half and a low half.”

A version of this story appeared on GothamSchools on March 6, 2012.

The Hechinger Report provides in-depth, fact-based, unbiased reporting on education that is free to all readers. But that doesn't mean it's free to produce. Our work keeps educators and the public informed about pressing issues at schools and on campuses throughout the country. We tell the whole story, even when the details are inconvenient. Help us keep doing that.

Join us today.

Letters to the Editor

3 Letters

At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.

By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.

  1. Aaah this whole evaluation system is a bunch of BS created by people who have no freakin clue what makes a good teacher. Any half decent administrator can tell a good teacher from a mile away, and a bad one as well. It’s like pornography….you may not be able to define it, but you know it when you see it. So…eventually after lots of time and effort has been put into this crappy evaluation system, we will see that student achievement has not become any higher…duh…and the whole thing will go away. What should replace it is an entirely different picture of what makes a good teacher, one that has nothing to do with numbers or bell curves…teaching is an art, and treating it like a science will not improve student achievement. You heard it here first….let’s see how long it takes to dump this whole plan.

  2. Here’s my take:

    The goal education reformers are claiming to make is that they want to reform only the poorest performing schools. If that were true, the D.O.E. would simply reform schools with children performing below grade level in math and reading.

    I believe the reformers’ actual goal is to CONVERT the ENITRE PUBLIC EDUCATION SYSTEM into the for-profit-charter school model (or “non-profit” charters with for-profit management model). To achieve that goal, they would need a rating system so that even the best teachers in the best schools could not get a good rating 2 years in a row.

    To illustrate, here is an example rating from an article in the NY Times: (In New York Teacher Ratings, Good Test scores Aren’t Always Good Enough by S.Otterman & R. Gebeloff):

    “In one extreme case, the formula assigned an eighth-grade math teacher at the prestigious Anderson School on the Upper West Side the lowest possible rating, a zero, even though her students posted test scores 1.22 standard deviations above the mean — normally good enough to rank in the 89th percentile. Her problem? The formula expected her high-achieving students to be 1.84 standard deviations higher than the average — roughly the 97th percentile.”

    This link has similar findings:

    The reformers say that the ratings give parents information, yet no one can tell what grade level the students achieved this year vs. last year, or expected improvement % over the previous year. WHERE IS THE TRANSPARENCY?

Submit a letter

Your email address will not be published. Required fields are marked *