NYC teachers learn lesson from pro baseball players: Don’t bank on privacy

value-added scores — Alex Rodriguez (photo by Keith Allison)

The world wasn’t exactly shocked in 2009 when Alex Rodriguez’s name turned up on a list of 104 Major League Baseball players who had tested positive for performance-enhancing drugs. Nor was anyone surprised to learn later that year that David Ortiz—the Red Sox’s “Big Papi,” and a six-time All Star—was on the same list. Except, that is, David Ortiz. For years, his standard line about drug testing was, “All I know is they’re going to find a lot of rice and beans.” And steroids, he forgot to say.

Both the league and the players had planned on keeping the results private. And so they were—until 2009.

Promised confidentiality has, of course, a checkered history. Circumstances change, leaders change, norms change. Technology turns science fiction into reality. And what was once never intended for public consumption ends up as front-page news—or online, for billions of us to download.

The U.S. State Department and Hillary Clinton know this only too well. So do New York City teachers. Under a 2008 agreement between Schools Chancellor Joel Klein and United Federation of Teachers President Randi Weingarten, all New York City teachers of math and English in grades 4 to 8 receive an annual “Teacher Data Report” that tells them how their students perform on standardized state tests.

But the data reports do more than that—they calculate a teacher’s individual “value-added” score, which indicates whether his or her students are doing better or worse than statistical models predict they’d do given their background characteristics. The thinking is that teachers whose students make exceptional gains year after year should somehow be rewarded or at least studied so that other teachers can emulate what they’re doing. The flip-side of this, naturally, is that teachers whose students routinely learn less than the models predict should be helped—or forced out.

Reliable value-added models are notoriously tough to build, as they must take into account numerous factors beyond a teacher’s control, including class size, students’ prior test scores and students’ poverty status. New York City’s model takes into account about three dozen such factors. Still, experts caution that value-added analysis is far from perfect, not least because its only measure of student learning is standardized test-score results.

The 2008 agreement made clear that the Teacher Data Reports were not to be used in teacher evaluations or tenure decisions. Nor were they intended for the eyes of anyone but the relevant teachers and principals. To this day, the New York City Department of Education website is unambiguous about this: “Teacher Data Reports are designed to be used internally, and should not be shared with parents, students, or the general public” (emphasis in the original).

But original intentions—as A-Rod, Big Papi and the State Department have learned—might not matter much. Race to the Top, the Obama administration’s signature education-reform initiative to date, urged states to factor student achievement into teacher evaluations. Many states, including Colorado and New York, changed relevant laws in hopes of winning part of Race to the Top’s $4.3 billion jackpot.

Last week, the Bloomberg administration squared off against the teachers’ union in court, arguing that the names and value-added ratings of roughly 12,000 New York City teachers should be made public. Bloomberg and Klein contend that the public has a right to know how public employees are performing. The union, in a 156-page filing, has countered that the Teacher Data Reports are often incomplete and inaccurate—and thus not ready for prime-time.

Value-added scores were headline news earlier this year when the Los Angeles Times published a series of articles, “Grading the Teachers,” based on district data that the paper acquired through the California Public Records Act. The Times also unveiled a publicly accessible database containing the value-added scores of about 6,000 elementary teachers in L.A. Unified School District. (Disclosure: The Hechinger Report helped fund the work of the economist who crunched the numbers for the Times, but it did not participate in the analysis.)

Other newspapers nationwide soon were clamoring to do the same—get their hands on value-added data and publish pieces about the high-performers and low-performers. At the same time, a spate of new reports on both the promises and perils of value-added analysis went mainstream. Proponents have tended to speak of value-added as if they’ve found the Holy Grail—finally, public education will be fixed! Opponents see instead the demonization of teachers and an obsession with mostly meaningless metrics that don’t capture the subtleties of a teacher’s work.

A ruling is expected in New York City as early as this week. But regardless of what Judge Cynthia Kern decides, it’s safe to say that the current teacher-evaluation system is broken in most school districts nationwide—and that value-added analysis is here to stay.

The reality is we like numbers and we love accountability, at least when it comes to schools. We are addicted to standardized test scores. We hate failure. And we cannot fathom why our urban public schools have been in such a deep funk for such a long time.

We know teachers are important. But are they, by and large, doing a good job? How can we know? What’s the evidence, other than gut feelings?

The answer is we don’t really have good ways to measure teacher performance right now. Our system for evaluating teachers wasn’t built to take into account their performance, so we’re struggling mightily to find ways of doing so.

The uselessness of current teacher-evaluation systems was made apparent in “The Widget Effect,” a 2009 report by The New Teacher Project that found more than 99 percent of teachers are rated “satisfactory” each year. The report, subtitled “Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness,” also found that teacher performance played no role in districts’ decision-making about professional development, compensation, tenure and layoffs.

The Obama administration, as well as many think tanks and education reformers, is determined to change that. And value-added scores will almost certainly play a prominent role.

To many reformers, value-added analysis is sleek and sexy—it involves lots of numbers and esoteric talk of “controls,” “confidence intervals” and “statistical significance.” Skeptics say it’s all statistical pyrotechnics—flashy but frivolous. Nonetheless, there’s an emerging consensus that a new system will need to take into account multiple measures of a teacher’s performance—value-added scores, but also classroom observations by peers and principals, lesson plans, portfolios and self-evaluations.

A perfect evaluation system is probably a pipedream. Andrew Rotherham, who writes the Eduwonk blog as well as the weekly “School of Thought” column for Time magazine, says that “You can’t have a system that makes sure nothing unfair happens to someone. … People can be let go for the wrong reasons. That can happen. That is life.”

The real question, then, is whether value-added analysis is an improvement over tools we’ve relied on in the past, and for Rotherham and many other experts in the field, the answer is an unambiguous “yes.”

Not everyone agrees, including some observers abroad. Pasi Sahlberg, Director General of the Centre for International Mobility and Cooperation in Finland’s Ministry of Education and Culture, says: “It’s very difficult to use this data to say anything about the effectiveness of teachers. If you tried to do this in my country, Finnish teachers would probably go on strike and wouldn’t return until this crazy idea went away. Finns don’t believe you can reliably measure the essence of learning.”