Perhaps nothing has remade American education so profoundly in my lifetime as the 2001 federal law that required states to assess all students every year. Because those scores were then used to evaluate educators and decide which schools to shut down, testing has changed how math and reading are taught in classrooms and reduced or even eliminated topics that aren’t tested — from civics and science to art and music. The coronavirus pandemic has given American schoolchildren a reprieve from this all-consuming spring ritual but it also has wreaked havoc with research studies that rely on test scores to measure whether an intervention is working. I thought this would be a good moment to talk to Daniel Koretz, an influential expert on testing and quantitative measurement methods, who retired a year ago from his post at Harvard’s Graduate School of Education. (This interview has been condensed and edited for clarity.)
The Hechinger Report: In your two books, Measuring Up (2009) and The Testing Charade (2017), you explain how testing inevitably led to test score inflation and, instead of improving schools, it harmed children and ruined teaching. Testing pressure has subsided since the 2015 federal education law, which removed some of the high-stakes consequences for low test scores.* Do you think schools are still inflating test scores?
Koretz: Test scores are certainly still inflated, but we often don’t know by how much. In this country, we treat education data as the private sandbox of superintendents and commissioners. This is entirely different from how we treat data in other areas of public policy, such as medicine or airline safety. Imagine you were to go to a superintendent and say, “I’d like access to your data to see how badly inflated your scores are.” They don’t say, “Well, I’ve been waiting for you to come.” They tell you, “No.” And so there aren’t that many studies, but the ones we have are quite consistent.
Experts have been writing about test score inflation since at least 1951. It’s not news but people have willfully ignored it. The core principle is that there’s a small number of things that you’re going to ask kids to do on a test. And you’re not just picking content; you’re picking how you present the material, what kind of response you demand of them, how it is scored. All those are decisions that narrow things down. But who cares about those 42 test items? What you care about is the big domain of math. The 42 items are only valuable if they let us estimate how much kids know about the big domain. The problem arises when people focus on what will be in the small tested sample, and scores then go up more than mastery of the larger domain.
How would you fix testing?
Many states give teachers test samplers so they know what’s coming. And if the real test doesn’t look a lot like the sampler, they get pissed off. That’s one thing that allows them to focus on the tested sample instead of the bigger domain. Instead, you should give maybe two or three very different ways of testing each standard [concept]. And tell teachers, here are illustrations of some of the ways that we might test the standard but your students have to be prepared to address the standard however it appears in real life. So it might look different from both of these examples, when it shows up on the test. That’s how testing ought to be done. But the way we do it now, the whole system is set up to inflate test scores because people can predict many of the details of the test.
Why did you decide to become an expert in testing when you’re so critical of it?
I’m not a critic of testing. I’m a critic of the misuse of testing.
I actually trained in developmental psychology, not in psychometrics, which I decided to do after a few years as a special ed teacher when I was really young. And I went from there to the Congressional Budget Office. Alice Rivlin was the director, and she asked for a meeting with me and my immediate boss. She was very distressed by an argument that was going on then about test scores in the 1980s. There were two camps. One claimed there had been a huge decline in test scores that had been reversed by recent conservative changes in education policies, which was nonsense — there hadn’t been time for them to have an impact. The other side argued that scores hadn’t declined. But it was really puzzling because they all were talking about the same two sources of data. So she said, “Well, could you sort this out?” I think I probably spent three years on it and I produced two monographs. They actually got more attention than anything I did later in my career. One made top of the fold on page one of the New York Times. And I got hooked by the interesting issues testing raises.
If you were a benevolent dictator, how would you fix schools?
The two biggest problems we have are the enormous inequities in the system, and related to that, our unwillingness to make teaching a high status and desirable job. Until we find a way to start making teaching an esteemed and desirable profession, we’re going to have a hard time attracting the people we want to teaching. That’s the 900-pound gorilla.
We have to start by saying, what do we really want to see when we walk into a classroom? I want to see kids who show a high level of cognitive engagement and who are not turned off. If you had an inspection system, you could tell the inspectors that we want to see teaching that engages kids. And if the teacher is doing that, and the test score increases are not as high as the teacher next door who’s drilling them on test prep materials, the one who should get credit is the first one, the engaging teacher.
We can’t escape the need to use human judgment in evaluating classrooms. This is controversial. But I want people to argue about how best to do this.
What are you doing now?
I teach English to adult immigrants, which is fun but extraordinarily difficult. When I taught measurement [at Harvard], I just knew this stuff cold. There are a lot of things that people don’t recognize about the grammar of their own language until they have to teach it. There’s something in English called stative verbs. Have you heard of it? I hadn’t. If you want to say you miss somebody who’s gone, you always say “I miss you,” and you wouldn’t say, “I am missing you.” I had never given this a moment’s thought but I have to make it clear to my students.
*Correction: An earlier version of this story incorrectly stated the year of passage of the federal law. Every Student Succeeds Act (ESSA) was passed in 2015, not 2019.
This story about Daniel Koretz was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for our Proof Points newsletter.