Blended Learning

New Common Core exams will test whether a robo-grader is as accurate as a human

Millions of elementary, middle and high school students in 14 states and Washington, D.C. may have their essays graded by computers next year if initial tests of robo-grading prove to be accurate.

A multi-state consortium known as PARCC that is developing the tests said it hoped to use essay-grading software as soon as Spring 2015, when its new computerized tests are scheduled to roll out. The new tests are primarily aimed at assessing whether students are learning new Common Core education standards, but test administrators are also experimenting with new testing technologies.

Photo by Helran

Photo by Helran

“One benefit of computerized scoring is you can get scores back sooner and it drives down costs,” explained Laura McGiffert Slover, chief executive officer of PARCC Inc., a non-profit that stands for Partnership for Assessment of Readiness for College and Careers. It is coordinating the development of new assessments for 14 states* plus the District of Columbia.

McGiffert Slover talked about computerized grading in a telephone briefing with reporters on March 20. Beginning Monday, PARCC will administer trial versions of the new tests to 1 million students, who will be asked to answer not only multiple choice questions but also write open response answers and essays. These sentences and essays will graded by humans. But the human grades will be used to “teach” the computers how to mark essays.

Jeff Nellhaus, director of policy, research and design at PARCC, explained that you need to have papers that have been scored by humans to calibrate the machines. Afterwards, the calibrated machines will grade additional essays to make sure the computer marks approximate human marks. If the computers pass that test, Nellhaus said that PARCC “could use the robo-graders as a second score with the roll out.” That is, instead of using two human graders for each essay, there would be one human and one computer.

Essay-grading software has been used in testing since at least 2004, when Indiana was the first state to use computer-scored essays in its statewide assessments. But critics have pointed to many problems in the way that computers have graded essays, and the technology is far from perfect. For example, The New York Times reported in 2012 that MIT professor Les Perelman tested an ETS (Education Testing Service) computer grader and found that it gave lower marks to a well-argued essay than a longer essay with nonsensical sentences. Pearson, which will be administering the PARCC tests, refused to allow Prof. Perelman to test its software-grading system.

*The 14 states are Arizona, Arkansas, Colorado, Illinois, Louisiana, Maryland, Massachusetts, Mississippi, New Jersey, New Mexico, New York, Ohio, Rhode Island and Tennessee.

Add Comment
comments powered by Disqus

Jill Barshay

Jill Barshay, a contributing editor, writes a weekly column, Education By The Numbers, about education data and research. She taught algebra to ninth graders for… See Archive