Better tests don't lead to better teaching, study finds

Bodie Manly helps Jarmon James, 12, with fractions in advance of a PARCC-like practice test. — A New Orleans teacher is helping a 12-year student to prepare for a Louisiana state test in this 2015 photo. A research study finds that test preparation, even for rigorous tests, isn’t improving the quality of instruction. Credit: Peggy Barmore

Ever since the federal government mandated annual testing for U.S. public school children in 2001, educators (and parents) have fretted over whether too much class time has been allocated to drilling and preparing students for standardized tests. Unfortunately, there’s very little research on test prep and its effect on teaching quality. Teaching quality is a very hard thing to study. Researchers usually don’t know exactly what teachers are teaching behind closed doors. And, even if you could be a fly on the wall in every classroom in America, one person’s view of a good lesson might differ from another’s.

Website for U.S. News & World Report — This story also appeared in U.S. News & World Report

Two previous studies, of math instruction, combined classroom observations and teacher interviews. One 2012 study found that instructional quality declined with the rise of high stakes testing, especially in the weeks before the exam. Teachers didn’t prompt students to understand solutions conceptually as frequently or present challenging problems as often. And a 2004 New Jersey study found a wide variation in test preparation lessons. Some teachers repeated procedures for solving problems. Other teachers asked students probing questions to help them understand.

A pair of researchers recently took another stab at the question of how test prep affects the classroom and wondered if better tests might shake things up. If a test isn’t just multiple-choice questions and asks students challenging math problems that require them to think, would the test preparation time be productive instructional time? That’s a particularly relevant question now that more than 20 states have adopted new, more rigorous exams alongside Common Core standards.

In this new study, researchers analyzed videotaped lessons in five different school districts, some of which had low-end tests and some high-end tests, and found that a more demanding test didn’t help improve the quality of the teacher’s instruction. A teacher’s test-prep lessons were generally of lower instructional quality than when the same teacher wasn’t prepping students for the test. More surprising, the researchers found that the quality gap between a teacher’s regular lessons and her test-prep lessons was largest in a school district where the teaching quality was the highest. In other words, instructional quality sank a lot when these excellent teachers were delivering test-prep lessons. In districts with lower teaching quality to begin with, the test-prep lessons weren’t much worse. But they didn’t raise instructional quality, either.

“How can we improve classroom instruction in the midst of high-stakes testing? One of the conclusions from this study is that you can’t expect the test itself to be the sole driver of change,” said David Blazar, a co-author of the study and a professor at the University of Maryland’s College of Education. “Some people think if you get rid of the test, then instruction improves. But the findings of this paper would lead you to be skeptical of that. Others say we could change what is tested and improve instruction. The findings here suggest that tests on their own are unlikely to improve instruction or to change what happens in the classroom.”

Blazar argues that policymakers need to consider other ways to support teachers, such as one-on-one coaching and on-the-job professional training.

The study, “Does Test Preparation Mean Low Quality Instruction?” was published October, 2017, in Educational Researcher, a peer-reviewed journal. Blazar and his co-author Cynthia Pollard were able to take advantage of earlier unrelated studies that had videotaped thousands of hours of fourth- and fifth-grade math lessons in Massachusetts, Georgia and Washington D.C. The videotaped lessons had already been coded for teaching quality, using a scale developed by Heather Hill of Harvard University. Rote instruction, such as repeating times tables, earned a lower score than using multiple methods to solve a problem, or offering an explanation that pinpointed the root cause of student’s misunderstanding. Opponents of testing typically argue that test prep lessons crowd out the kind of sophisticated instruction valued by Hill’s Mathematical Quality of Instruction (MQI) scale.

At the time of the videotaping, from 2010 to 2013, Massachusetts had a notably more challenging annual test than Georgia or Washington D.C., which gave the researchers a chance to see if the teachers they studied in Massachusetts had higher quality test-prep lessons. The Georgia and Washington tests were entirely, or mostly, multiple-choice questions. By contrast, 40 percent of the Massachusetts test asked students to write their answers in a box or an open-response field. Many questions were non-routine problems that asked students to look for patterns or explain their reasoning.

In the next step, Blazar and Pollard categorized all the videos as either test-prep lessons or not. They began with keyword searches for 70 testing terms, such as “proficient” or “open response”, and then checked those lessons to make sure they were engaging in test prep, at least some of the time. In the end, they found lessons that engaged in some amount of explicit test preparation for 60 teachers, then compared those teachers’ test-prep lessons with their non-test-prep lessons. On average, the test-prep lessons were worse. But not by a lot.

Driving the results was one Massachusetts district, where both teaching quality and test quality were notably high. Here the teachers’ test-prep lessons earned much lower quality scores than their regular lessons. In districts with lower-quality tests, there wasn’t as big a decline. But the teaching quality wasn’t as high to start with. Blazar saw the same pattern in the other Massachusetts district, despite the better test. Teaching quality wasn’t as high to start with, and test-prep lessons weren’t much worse than non-test-prep lessons. (None of the districts in the paper was identified.)

Disappointing test-prep lessons had different types of shortcomings. In an earlier, unpublished draft of the paper, which Blazar shared with me, he and Pollard described how students spent an entire hour doing multiple-choice practice questions and received no feedback other than whether they were right or wrong. In another low-quality lesson, the teacher spent a lot of time on a mnemonic for rounding decimals, without making sense of the procedure. Other times teachers spent too much time discussing the format of the test, letting 15 or 20 minutes of class time elapse without any math instruction whatsoever.

Not every test prep lesson was poor. In Blazar’s review of the videotaped lessons, he found a fantastic test-prep lesson that used short-response practice questions to review percentages. When a student was stumped on how to calculate the percent of novels in a book collection because there were fewer than 100 books in total, the teacher made analogies to having only 15 kids in a classroom and had the student calculate 33 1/3 percent of that. The teacher explained that the whole doesn’t always have to be 100, as it is with yards on a football field. Then the teacher addressed the whole class, spontaneously uttering a series of even more challenging problems, such as “If there are 20 kids in the classroom, what is 150 percent of that?”

Lessons like these were far superior to most teachers’ ordinary lessons in the study. However, they were a rarity among the 73 test-prep lessons Blazar reviewed. Teaching to the test can be done well, but it’s not easy.