Students find testing an annoying waste of time, at best, and often frustrating. Many teachers don’t like assessments either, bemoaning the lost resources for other aspects of learning, like critical thinking, that have been drawn away by test prep.
Assessment is costly not only in terms of time, but also in terms of money. It can now cost tens of thousands of dollars to generate a single K-12 test question. The National Assessment of Educational Progress (NAEP) is funded at over $160 million per year. States spend even more on their own state tests.
Some educators have proposed scrapping standardized testing altogether. But that would be going backward, not innovating forward — because tests do have value.
Tests can inform teachers how individual students are doing and who needs more help. Tests can also help researchers and policymakers understand what is working. And large-scale testing such as the NAEP and the OECD’s Programme for International Student Assessment (PISA) provide benchmarks against which cities, states and nations can judge the success of their systems.
Importantly, tests can also provide critical information to help educators close learning gaps and ensure that students from underserved backgrounds receive equal opportunities.
Yet the current testing system is unsustainable. What can we do?
Improvements are available now. For example, much more can be done to use technology to improve the quality of exams. Many exams lack essay questions — and rely just on multiple-choice items — because grading essay questions is expensive.
But new computational methods such as natural language processing can help automate the process of scoring essays and open-response questions. These methods have been used to a limited extent with the NAEP to good effect — but need to be made ubiquitous.
Technological innovations can also make testing work better for teachers. A new project being developed by researchers John Gabrieli and Yaacov Petscher could help accelerate the development of automated early reading assessment, which will make it easier for teachers to diagnose reading challenges and discern when and where additional support will be most effective.
The current testing system is unsustainable. What can we do?
These and other innovations can also drive down cost. Using principles of artificial intelligence to automate the process of writing test questions can help ensure that students never run out of high-quality questions to answer and we don’t have to worry about cheating, because every assessment item will be unique. The ability of artificial intelligence to generate new test items is still nascent, but innovations are showing promise at far lower costs.
Education leaders can play an important role in supporting this next generation of assessments.
First, we should strive to set ambitious goals for where assessment innovation can go. Let’s ask the field: What would you need to take a large-scale assessment and reduce its cost by half while increasing its quality?
Second, government agencies and research funders should invest in advanced computational methods in operational assessments — not just relegate them to one-off “special studies.”
This investment could include the creation of benchmark datasets that serve as a target for algorithmic innovation. For instance, in the next year, the NAEP is planning a “bake off” between companies that use automated scoring to see how their results compare to the hundreds of thousands of hand-scored tests in the NAEP’s files.
We expect that the scoring accuracy of both approaches will be similar — but that automated scoring can save millions of dollars.
Third, fostering talent is critical. New testing designs will require new test researchers, developers, statisticians and AI experts who think outside the box. Efforts are already underway to support new talent, and the Duolingo English Test recently launched a fellowship program to support young researchers in the assessment field.
But, most importantly, we must recognize that the status quo is broken. We need new thinking, new methods and new talent.
Here’s an example. In 2011, the renowned — but aging — U.S. space shuttle program came to an end. The primary reasons for its demise were ballooning bureaucratic costs and two tragic accidents. It was almost a decade before American astronauts were again launched into space from U.S. soil — but this time on a commercially built SpaceX spacecraft.
What did SpaceX do that the government didn’t? SpaceX and the commercial space sector relentlessly focused on lowering the cost of sending something into orbit, dropping that cost from $54,500 per kilogram to just $2,720.
That cost reduction has been revolutionary. Suddenly, space travel can enable expanded broadband access, real-time tracking for methane leaks, the spotting of wildfires before they spread and much more.
A similar sea change is needed in education around testing. Like space exploration, testing is a complex domain, with multiple objectives, including reliability and effectiveness.
Testing needs an injection of innovation, and the field needs a new generation of SpaceX-style disruptors who can take advantage of new technologies to retool testing and make it work better for policy makers, educators, parents and students.
We need a SpaceX for assessment.
Mark Schneider is director of the Institute of Education Sciences at the United States Department of Education. Kumar Garg is managing director and head of partnerships at Schmidt Futures.
This story about improving testing was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.