The envelope, please...

They weren’t hermetically sealed.

Ten weeks ago, I made some predictions about New York City’s 2013 proficiency rates on the New York State English Language Arts and mathematics assessments—the first New York tests to be aligned with the challenging Common Core State Standards adopted (more or less) by about 45 states across the country. I relied on just two bits of information: (1) New York City’s 2012 proficiency rates; and (2) what happened in Kentucky when it shifted from its previous assessments to Common Core-aligned assessments. The predictions were not based on any knowledge of the specifics of the new assessments in New York, or what the Big Apple’s teachers were doing in their classrooms.

How did I do? The two figures below tell the tale. The first displays proficiency rates on the English Language Arts assessments in grades 3-8, overall and for particular demographic subgroups. The blue column is the 2012 proficiency rate; the red column is my prediction for 2013; and the yellow column is the actual proficiency rate in 2013. I underestimated overall proficiency by a bit, predicting that 22 percent of students in grades 3-8 would be classified as proficient, when in fact 26 percent fell into that category. But I was within one percentage point for Black and Latino students, English Language Learners, and students with disabilities.

In math, shown in the second figure, my prediction for 2013 was almost on the mark, as I had predicted a proficiency rate of 31 percent, and the actual rate was 30 percent. I wasn’t quite as accurate with the subgroups in math; I overestimated the percent of Black and Latino children and youth who would be classified as proficient by five percentage points, but missed the mark by only two or three percentage points for all of the other groups displayed in the figure.

What do my powers of prognostication mean? (Other than my opening a storefront on 72nd Street, where I will be happy to read your palm for $15…)

What my parlor trick reveals is that the distribution of scores on the new Common Core-aligned tests in New York is not news at all. And yet many people are behaving as though this is some seismic shift in education policy and practice. Of course, they are bringing their distinctive points of view to bear on their interpretations. Former New York City Schools Chancellor Joel Klein, for example, sees good news in the fact that the policies he championed for nearly a decade have resulted in fewer than one in five Black and Latino students being classified as proficient in English Language Arts and mathematics. “While some may use lower test results to score political points and argue that we should abandon higher standards,” he intones, “this would do our kids a grave injustice.” Better, I guess, that we should use lower test results to score political points and argue that we should embrace higher standards.

And irrepressible Mayor Michael Bloomberg has no qualms about declaring victory whether scores increase, stay the same, or go down. The low proportions of city children classified as proficient “actually is some very good news,” he said, blaming the media for “not understanding the numbers.”

Here’s the dirty little secret: no one truly understands the numbers. We are behaving as though the sorting of students into four proficiency categories based on a couple of days of tests tells us something profound about our schools, our teachers and our children. There are many links in the chain of inference that can carry us from those few days in April to claims about the health of our school system or the effectiveness of our teachers. And many of those links have yet to be scrutinized.

Does Mayor Bloomberg understand the numbers? Perhaps he’d care to share with us the percentage of children in each grade who ran out of time and didn’t attempt all of the test items, and the consequences of that for students’ scores. Or how well the pattern of students’ answers fit the complex psychometric models used to estimate a student’s proficiency. Or how precisely a child’s scale score measures his or her performance. Or how many test items had to be discarded because they didn’t work the way they were intended. Or what fraction of the Common Core standards was included on this year’s English and math tests—and what was left out.

These are just some of the factors in the production of the proficiency rates that have been the subject of so much attention. And the properties of the test are just one link in the chain.

Since I’m so good at prognostication: I predict that state test scores, in New York and elsewhere, will continue to be used as a basis for important policy decisions, despite the fact that test scores tell us just a little bit about the things we care about.