Evaluating teachers: Precise but irrelevant metrics?

I’ve told this joke before:

Two hot-air balloonists get lost, and they’re floating aimlessly. They spot someone down below and call out, “Hello!”

The person on the ground replies, “Hello!”

“Where are we?” one calls down.

Up comes the reply: “You’re in a balloon!”

They continue to drift, when one of the balloonists says to the other, “Who was that?”

The other responds, “That was obviously an economist.”

“An economist? How can you tell?” the first asks.

“Because what he said was precise, but irrelevant.”

Unfair to economists? Of course! But surely in keeping with the mongoose-cobra relationship that characterizes sociologists and economists. (And some of my best friends, etc., etc.) A case in point:

Earlier this week, FiveThirtyEight, founded by data whiz Nate Silver, posted a feature on the application of value-added models to the evaluation of K-12 teachers. Quantitative editor Andrew Flowers argued that a key part of the debate is over, and that recent studies have converged on the finding that value-added measures accurately predict students’ future test scores. The article cites all of the usual suspects: Raj Chetty, John Friedman, Jonah Rockoff, Jesse Rothstein, Tom Kane, and Doug Staiger. Thoughtful and creative economists one and all, armed with an arsenal of quantitative methods and administrative data to which to apply them.

The debate has hinged on the fact that students are usually not randomly assigned to teachers, and thus one can never be sure that differences among teachers in their students’ test scores are due to the influence of the teacher, rather than to unmeasured differences in the attributes of students or of a classroom.

“It’s at the level of the school building that most of the action around teacher evaluation and its consequences occurs, and truth be told, most economists are not devoting much attention to the interior of the school or the social relations among school leaders, teachers and students.”

The key technical issue is the ability of quasi-experimental statistical models to reproduce the results that are observed in the handful of randomized experiments that provide the strongest evidence of the causal effects of high value-added scores.

Evidence is accruing that such models yield similar results, implying that value-added models can identify teachers who are indeed better at raising students’ test scores. I don’t think this precludes an unscrupulous principal from assigning challenging students to a teacher in the hope that the teacher will fail, and obtain a low value-added score; however, the models are not designed to illuminate specific cases, but rather to reveal trends across many teachers and classrooms.

It’s not controversial to argue that some teachers are more skilled or effective than others, and that some are better at boosting their students’ scores on standardized tests. And in a society that relies so heavily on tests of all kinds to certify and select people, it’s quite possible that exposure to one teacher versus another could have long-lasting effects on students’ lives.

“Research on value added has no implications for action in isolation from other research about effective schooling because, like any research program, the narrow conditions that make value-added research convincing limit its direct applicability in practice.”

But even if these points are settled, they’re largely irrelevant to the design of teacher evaluation systems. As education researcher Stephen Raudenbush of the University of Chicago—who I will proudly claim as a sociologist and former colleague—asks in the March 2015 issue of Educational Researcher, “Does the answer to a precisely focused research question, by itself, have implications for practical action?” He goes on to argue, “Research on value added has no implications for action in isolation from other research about effective schooling because, like any research program, the narrow conditions that make value-added research convincing limit its direct applicability in practice.”

This point, inscribed throughout this special issue of Educational Researcher on teacher value-added models and educational practice, emphasizes the political and organizational challenges in designing teacher evaluation systems that yield ratings that are transparent and fair, offer information on which teachers can act to improve their practice, and are devoid of unintended consequences that might disrupt a school’s capacity to promote student learning.

It’s at the level of the school building that most of the action around teacher evaluation and its consequences occurs, and truth be told, most economists are not devoting much attention to the interior of the school or the social relations among school leaders, teachers and students. Sociologists and other education researchers may not have a common vocabulary to describe these social relations, and the technology for modeling and prediction is not as elaborate. But the research agenda is relevant, if less precise.

An arsenal is only useful if directed at the right target.

This story was produced by The Hechinger Report, a nonprofit, independent news website focused on inequality and innovation in education. Read more about teacher effectiveness.