Critics have attacked Big Pharma for widespread biases in studies of new and potentially profitable drugs. Now, scholars are detecting the same type of biases in the education product industry — even in a federally curated collection of research that’s supposed to be of the highest quality. And that may be leaving teachers and school administrators in the dark about the full story of classroom programs and interventions they are considering buying.
An analysis of 30 years of educational research by scholars at Johns Hopkins University found that when a maker of an educational intervention conducted its own research or paid someone to do the research, the results commonly showed greater benefits for students than when the research was independent. On average, the developer research showed benefits — usually improvements in test scores — that were 70 percent greater than what independent studies found.
“I think there are some cases of fraud, but I wouldn’t say it’s fraud across the board,” said Rebecca Wolf, an assistant professor in the Center for Research and Reform in Education at Johns Hopkins University and lead author of the draft study. “Developers are proud of their products. They believe in them. They’ve worked hard in developing these products. They want a study that puts the best face forward.”
Biased research matters because current federal law encourages schools to buy products that are backed by science. In order to tap into federal school improvement funds, for example, low-achieving schools with disadvantaged children are required to select programs that have been rigorously tested and show positive effects.
The study, “Do Developer-Commissioned Evaluations Inflate Effect Sizes?” was presented at a March 2019 conference session of the Society for Research on Educational Effectiveness (SREE) in Washington, D.C. The paper is a working paper, meaning it has not yet been published in a peer-reviewed journal and may still be revised.
Wolf and three of her colleagues analyzed roughly 170 studies in reading and math dating as far back as 1984 that are part of the What Works Clearinghouse. That’s an archive of research that the U.S. Department of Education launched in 2002 to help educators decide which educational products to buy. It is by no means a complete or an exhaustive collection of educational research but a group of high quality studies curated by experts. The studies track test score gains and compare students who got the intervention with those who didn’t.
More than half, or 96, of the studies were conducted by independent researchers while 73 of them had some sort of insider connection with creating or selling the product. Wolf labeled the research a “developer” run or funded study if the inventors, distributors or an employee of the developer or distributor were involved in the research. Studies were considered developer studies even if the developer didn’t directly conduct the research but commissioned an outside researcher to carry out the study.
Wolf took many aspects of the studies that can lead to bigger student gains into consideration. For example, a personal tutor tends to produce larger student gains than a curriculum used by an entire classroom. Kids in younger grades tend to see bigger improvements than older kids. Smaller studies on fewer students are more likely to show a bigger bang than larger ones. But even within a host of subcategories, Wolf found that the developer studies still pointed to larger benefits than the independent studies.
Replication studies are relatively rare in education research but both developer and independent studies were available for 18 of the reading and math interventions. When Wolf compared these independent and the developer studies side by side, the developer studies tended to post 80 percent higher gains for students for the same educational product.
There are a number of reasons for why developer studies tend to show stronger results, according to Wolf, whose full time work is to evaluate educational programs. The first is that a company is unlikely to publish unfavorable results. Wolf speculates that developers are more likely to “brand a failed trial a ‘pilot’ and file it away.”
A second common issue is how students are kept out of experiments. Timothy Shanahan, a reading specialist and a professor emeritus at the University of Illinois at Chicago, shared an anecdote before attendees of the March 2019 SREE conference. He recalled a reading study where struggling students who didn’t complete the program were excluded from the treatment group. The comparison control group, of course, kept the low achieving readers and their low scores, making the intervention look more successful. Wolf also found these sort of “sample selection” differences when she compared developer and independent studies side by side. One developer study decided to exclude some students from the treatment group after randomly being selected for it. These details are often in the study’s fine print but educators would have to look for them.
Developers often create their own yardsticks for measuring student success, devising their own assessments to go along with their programs. That might allow an education product company to measure what they’re teaching more precisely. But those same gains are often not evident in a reading or math assessment given to all students each spring.
These research choices that lead to bias seem to be an open secret in education research circles. Wolf said she asked researchers who heard her presentation if they were surprised by her conclusions. “Every single person said ‘no.’ If you’re in the work of program evaluation, you can see why these things might happen,” said Wolf.
This isn’t the first study to detect bias in education research. The problem of hiding unfavorable results from publication was documented as far back as 1995. In 2016, one of Wolf’s co-authors, Robert Slavin, wrote about the positive results that researchers get when they devise their own measures to prove that their inventions work. In that same year, another group of researchers also detected a developer bias in a smaller group of studies about math programs that are part of the What Works Clearinghouse collection. This new Hopkins study addresses some questions about that analysis and confirms the conclusion that when people study their own inventions, the results are stronger.
Solving this bias problem won’t be easy. Some advocate for pre-registration, something that the field of medicine uses, in which study authors describe the design and measures to be used ahead of time. SREE launched such a registry in 2018. That makes it harder for developers to tweak their study design on the fly when the students aren’t faring as well as they had hoped. However, schools are complex places and it’s often necessary to make adjustments to an experiment when something isn’t working with teachers or school-day schedules.
Wolf argues that educators should pay more attention to whether the research is independent. In her research for this study, developer funding wasn’t always disclosed and she often had to contact researchers to learn these details. Wolf said these conflicts of interest should be highlighted and disclosed up front.
Sunlight is a remedy just as in the pharmaceutical industry.
This story about education research was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.