When policymakers don't understand basic statistics

Imagine, if you will, that the Stuarts have two children, Kylie and Frederick. Kylie was born in 1997 and Frederick was born in 2001, both in Toledo, Ohio. The Stuarts measure Kylie’s height when she reaches the spring of fourth grade, in 2007, and learn that she is 52” tall, or 4’4”.

Shortly thereafter, the family moves from Toledo to Kansas City, Mo., and Mrs. Stuart decides to change the family’s diet, reducing the red meat content and adding more whole grains and fresh fruit. When Frederick is in the spring of his fourth-grade year, in 2011, the Stuarts measure his height. They find that Frederick is 54” tall, or 4’6”—two inches taller than Kylie was at the same age.

Mrs. Stuart is ecstatic. “The family’s height has grown over time!” she exclaims. “Our fourth-grader’s height in 2007 was 52 inches, and our fourth-grader in 2011 was 54 inches tall. That’s growth over time!” Giddy, she says, “And I’m sure that the growth is because of the change in diet. It really works! If other families did this, their kids would grow just as much as ours did from 2007 to 2011!”

Mrs. Stuart may not know it, but she has a future in politics. The mangled logic on which she relies is no different than what emerged last week from the mouths of well-known policymakers upon the release of the 2013 scores on the National Assessment of Educational Progress (NAEP).

Every two years, the federal government assesses a sample of fourth- and eighth-grade children in reading and mathematics in all 50 states and the District of Columbia, as well as students in a sample of urban school districts. When the scores are released, policymakers and pundits scramble to interpret the results, frequently putting their own spin on the differences in performance between different years.

One common mistake, reflected in Mrs. Stuart’s logic, is to treat the difference between the scores in a given administration and those in the previous administration as an indicator of growth or decline. As Matt Di Carlo has repeatedly warned, the fourth-graders in Arkansas in 2013 are not the same students as the fourth-graders in Arkansas in 2011—just as the fourth-grader in the Stuart family in 2011, Frederick, is not the same child as the fourth-grader in the Stuart family in 2007, Kylie.

Even if NAEP relies on a representative sample of fourth-graders in a state in a given year, the demographic composition of the children in any state will change over time, so that features of the fourth-graders in Arkansas in 2013 will differ a bit from the students who were in fourth grade in 2011. These changes in demography could account for differences in the average performance of Arkansas fourth-graders in 2011 and 2013.

We typically think of “growth” as an attribute of individuals, and my example of a child’s height is intended to dramatize this. Change in the attributes of an individual over time may indicate growth or decline. But NAEP does not measure the same individual at two points in time. Instead, it measures different individuals at each assessment.

And the second big mistake, of course, is Mrs. Stuart’s claim that the change in diet accounts for the fact that the family’s fourth-grader in 2011 is taller than the family’s fourth-grader in 2007. There’s really no way to pin this down, because there are so many differences in the experiences of Kylie and Frederick that might matter. Heck, maybe there’s less smog in Kansas City, or better schools, or more opportunities for organized sports. Diet may be a plausible explanation, but it is scarcely the only explanation. Only someone untutored in the logic of social-science research would firmly say, “The diet is working!”

Which brings us to Arne Duncan, John King, Jr. and Merryl Tisch. Education Secretary Duncan was intent on linking the gains observed in three locales—the District of Columbia, Hawaii and Tennessee—to the funds they received from the Race to the Top competition, and the resulting implementation of the Common Core curriculum and statewide teacher evaluation systems. Catherine Gewertz of Education Week quotes Duncan, in a pre-release conference call, as saying, “Tennessee, D.C. and Hawaii have done some really tough, hard work and it’s showing some pretty remarkable dividends.” (Apparently, no other states are doing tough, hard work—or at least no other states merit a mention. Especially not other states that adopted the Common Core standards long before the 2013 assessment, or other states receiving Race to the Top funds.)

And in New York, State Education Commissioner John King, Jr. and Chancellor of the Board of Regents Merryl Tisch issued a joint statement reaffirming their commitment to the state’s education reform agenda—because of the NAEP results in Tennessee and the District of Columbia.

“What happened in Tennessee and Washington, D.C., can happen here,” King said. “Tennessee and Washington, D.C., are out in front on meaningful teacher and principal evaluations, and the NAEP results show that those evaluations, along with the shift to the Common Core, are helping students learn more.”

Setting aside one obvious issue—that 44 percent of the children in District of Columbia Public Schools attend charters, which do not use DCPS’s teacher and principal evaluation systems—the notion that whatever has been happening in Tennessee and Washington, D.C., can be attributed to educator evaluation systems and the shift to the Common Core is just as ludicrous as Mrs. Stuart’s claim that “The diet is working!” But King and Tisch are unswayed. “It’s just more evidence that New York needs to stay on this road,” said the Commissioner.

If the U.S. Department of Education or the New York State Education Department ever announce an opening for Chief Disinformation Officer, I think Mrs. Stuart should apply.