Want to be rated ‘highly effective’ in New York? Don’t teach these classes

English and math teachers in grades 4-8 in New York are much less effective in promoting student growth on state assessments (or comparable measures) than other teachers in the Empire State.

Does that sound plausible? That teachers of particular subjects in particular grades are just not as good at promoting student learning?

Perhaps not. But it’s the inevitable conclusion to be drawn from the scores awarded to New York teachers as part of the 2012-13 Annual Professional Performance Review (APPR). New York’s state education law, passed in 2010 and amended in 2012, provides for teachers to be classified as “highly effective,” “effective,” “developing” or “ineffective” based on their score on a 100-point scale. Twenty points are based on student growth on state assessments (or comparable measures); an additional 20 points on locally selected measures approved by the New York State Education Department; and 60 points on multiple measures of teacher effectiveness, most commonly classroom observation ratings.

Nearly a year after New York’s 700 or so school districts provided the 2012-13 APPR ratings to their teachers in September 2013, the State Education Department released summary information about the ratings required by the law. (New York City was not included because its plan was approved for the start of the 2013-14 school year.) For the more than 100,000 teachers whose ratings were released—ratings that might compromise a teacher’s privacy were excluded from the data—the results were overwhelmingly positive. Nearly 55 percent of teachers statewide were classified as “highly effective,” and an additional 41 percent were classified as “effective.” Fewer than four percent were rated “developing,” and not even one in a hundred teachers was classified as “ineffective.”

Because the ratings are based on 700 different APPR plans, however, these distributions differed substantially from one district to the next. In Rochester and Syracuse, for example, only two percent of teachers were rated “highly effective,” and 39 percent were classified as “developing” or “ineffective.” Little wonder that there is litigation regarding these ratings, which by state law may lead to an expedited teacher dismissal process. (I’m a consultant on two such lawsuits.)

But there is another source of inequality in ratings that hits teachers within the same district, and even within the same school. By law, teachers of English and math in grades 4-8 receive the first 20 points of their overall evaluation based on their “Mean Growth Percentile,” a complex calculation of how their students performed on the statewide annual assessments in English and/or math compared to the performance of similar students across the state. This calculation, commonly referred to as a value-added measure of teacher performance, results in a bell-shaped curve that ranks teachers in relation to one another. The teachers whose students score much higher, on average, than similar students will be rated “highly effective” on this component; conversely, the teachers whose students score much worse, on average, than similar students will be rated “developing” or “ineffective.” By design, the vast majority of teachers will be classified as “effective.”

The figure below shows the distribution of the ratings for the “state assessments or comparable measures” component for the roughly 40,000 teachers who received Mean Growth Percentile scores because they taught English and/or math in grades 4-8 in 2012-13. Seven percent of these teachers were rated “highly effective,” and 76 percent were rated “effective.” But one in six teachers received a classification of “developing” or “ineffective” (11 percent and 6 percent, respectively.)

The figure also shows the distribution of ratings for teachers without a student growth percentile score. Each district across the state had to propose a way to rate these teachers in its APPR plan, and the New York State Education Department approved every one of these 700 plans. What is immediately obvious is that teachers whose state growth ratings were not based on the growth percentiles received much higher ratings than those whose ratings were based on the growth percentiles. Just seven percent of the teachers with growth percentiles were classified as “highly effective” in 2012-13, whereas 64 percent of those whose ratings were derived at the school district were rated “highly effective.” At the other end of the scale, 17 percent of teachers with growth percentiles were classified as “developing” or “ineffective,” compared to about seven percent of the teachers whose ratings on the state growth measures were locally developed.

The point here is not that one or the other of these sets of ratings is “right”; the percentage of teachers who should be classified as “highly effective,” “effective,” “developing” or “ineffective” is purely a matter of judgment, and where one stands is often a function of one’s location in the education policy landscape. Rather, what these data illuminate is the arbitrary nature of the evaluation process. The New York State Education Department, through its APPR plan approval process, has institutionalized an inequity that pits teachers in a school district who receive a growth percentile score against those who do not. Is it fair that a teacher’s rating on the state growth component of his or her annual performance evaluation should depend so heavily on the receipt of a state growth percentile score?

Proponents of complex teacher evaluation systems are fond of saying, “Don’t let perfection be the enemy of good.” No system can eliminate all sources of error or imprecision, they argue, but the newest generation of teacher evaluations provides enough information to serve as a basis for high-stakes employment decisions, such as the award of tenure, or merit pay, or expedited dismissal. What counts as “good enough” is, of course, a matter of judgment. Few would begrudge the occasional teacher who is inappropriately classified—and then terminated—the belief that basing any high-stakes decision on a misclassification, regardless how rare, is not good enough. The policy tradeoffs are complex.

In New York, it’s not that perfection is the enemy of good. When it comes to teachers’ scores on the state growth component of annual performance evaluations, the New York State Education Department has assured that the “highly effective” are the enemies of the “effective.”