Get important education news and analysis delivered straight to your inbox
In April of 2012, Mark D. Shermis, then the dean of the College of Education at the University of Akron, made a striking claim: “Automated essay scoring engines” were capable of evaluating student writing just as well as human readers. Shermis’s research, presented at a meeting of the National Council on Measurement in Education, created a sensation in the world of education —among those who see such “robo-graders” as the future of assessment, and those who believe robo-graders are worse than useless.
The most outspoken member of the second camp is undoubtedly Les Perelman, a former director of writing and a current research affiliate at the Massachusetts Institute of Technology. “Robo-graders do not score by understanding meaning but almost solely by use of gross measures, especially length and the presence of pretentious language,” Perelman charged in an op-ed published in the Boston Globe earlier this year. Test-takers who game the programs’ algorithms by filling pages with lots of text and using big words, Perelman contended, can inflate their scores without actually producing good writing.
Perelman makes a strong case against using robo-graders for assigning grades and test scores. But there’s another use for robo-graders — a role for them to play in which, evidence suggests, they may not only be as good as humans, but better. In this role, the computer functions not as a grader but as a proofreader and basic writing tutor, providing feedback on drafts, which students then use to revise their papers before handing them in to a human.
Instructors at the New Jersey Institute of Technology have been using a program called E-Rater in this fashion since 2009, and they’ve observed a striking change in student behavior as a result. Andrew Klobucar, associate professor of humanities at NJIT, notes that students almost universally resist going back over material they’ve written. But, Klobucar told Inside Higher Ed reporter Scott Jaschik, his students are willing to revise their essays, even multiple times, when their work is being reviewed by a computer and not by a human teacher. They end up writing nearly three times as many words in the course of revising as students who are not offered the services of E-Rater, and the quality of their writing improves as a result. Crucially, says Klobucar, students who feel that handing in successive drafts to an instructor wielding a red pen is “corrective, even punitive” do not seem to feel rebuked by similar feedback from a computer.
A close look at one of the growing number of independent studies of automated writing feedback provides some clues as to what might be going on among NJIT students. Khaled El Ebyary of Alexandria University in Egypt and Scott Windeatt of Newcastle University in Britain published the study in the International Journal of English Studies; it looks at the effects of a robo-reader program called Criterion on the writing of education students learning to teach English as a foreign language. The students in the study received Criterion’s feedback on two drafts of essays submitted on each of four topics.
The computer program appeared to transform the students’ approach to the process of receiving and acting on feedback, El Ebyary and Windeatt report. Comments and criticism from a human instructor actually had a negative effect on students’ attitudes about revision and on their willingness to write, the researchers note. By contrast, interactions with the computer produced overwhelmingly positive feelings, as well as an actual change in behavior — from “virtually never” revising, to revising and resubmitting at a rate of 100 percent. As a result of engaging in this process, the students’ writing improved; they repeated words less often, used shorter, simpler sentences, and corrected their grammar and spelling. These changes weren’t simply mechanical. Follow-up interviews with the study’s participants suggested that the computer feedback actually stimulated reflectiveness in the students — which, notably, feedback from instructors had not done.
Why would this be? First, the feedback from a computer program like Criterion is immediate and highly individualized — something not usually possible in big classes like those at Alexandria University, the site of the study by El Ebyary and Windeatt. Second, the researchers observed that for many students in the study, the process of improving their writing appeared to take on a game-like quality, boosting their motivation to get better. Third, and most interesting, the students’ reactions to feedback seemed to be influenced by the impersonal, automated nature of the software.
This may seem paradoxical. When critics like Les Perelman of MIT claim that robo-graders can’t be as good as human graders, it’s because robo-graders lack human insight, human nuance, human judgment. But it’s the very non-humanness of a computer that may encourage students to experiment, to explore, to share a messy rough draft without self-consciousness or embarrassment. In return, they get feedback that is individualized, but not personal — not “punitive,” to use the term employed by Andrew Klobucar of NJIT.
Evidence of this peculiar advantage of technology can be found in a field outside education. Public health professionals have long known that people will more readily disclose sensitive information to a computer than to a person. When typing their answers on a keyboard, rather than looking a questioner in the eye, respondents reveal more about their health problems, acknowledge that they’re suffering from more symptoms (especially psychiatric symptoms), admit more HIV risk behaviors and confess more drug use. (Fun fact: Women confess a greater number sexual partners when asked by a computer, while men drop the macho act and admit fewer.)
The “disinhibition effect” produced by technology, writes Adam Joinson, a professor at the University of the West of England, emerges whenever an individual has reason to feel anxiety, self-consciousness or worries about being evaluated. And anxiety, self-consciousness and worries about evaluation are just the emotions that, sad but true, many people feel around learning. Research has repeatedly shown that many students experience these uncomfortable emotions in relation to writing, as well as to math, science and foreign languages.
Precisely because learning can be so emotionally fraught, a non-judgmental computer may motivate students to try, to fail and to improve more than almost any human. Just don’t let the robo-reader give out grades.
At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.
By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.
This is interesting, but it leaves me with doubts. How good is the robo-readers’ criticism?
Can the robo-readers spot a fallacy or a misused word? Can they recognize an unusual phrase or interesting idea? Do they notice irony and allusions?
Try giving a robo-reader an essay by Swift, Emerson, or Chesterton. Revise as directed, as many times as instructed, and see what comes out of it. I bet that the final draft will be far inferior to the original.
It is also worth examining whether lots of revision is necessarily good revision, and whether “disinhibition” is always helpful.
Hurrah! one step past Spell Check (well, maybe two.) A dispassionate analysis of sentence structure and vocabulary use–could only sooth the frightened writer’s soul. The fact a student could have such an excellent resource for rough drafts (or second or third drafts) is marvelous.
I’m most interested in the rules and criteria that support robo-readers. One of the most difficult aspects of developing instruction in abstract topics (i.e., “good writing”) is bringing various experts to consensus on the operational definition of “good’. Sure, evaluating a topic like sentence structure is fairly easy, what are the rules robo-reader uses to evaluate more abstract elements like relevance, alignment, effective use of imagery or metaphor, etc.
I remember well in April of 2012 when the NY Times published an article on similar software. They were kind enough to include my editorial at that point. I said that research shows that students benefit much more from positive feedback as opposed to something which merely points out mistakes (or merely suggestions for improvement). If students are reacting negatively to our feedback then certainly, teachers need to change their approach and manner of speaking. This is no small matter. Also, what compels a student to trust a machine more than a teacher? Despite her tongue-in-cheek comment, I agree with Melanie that such devices can be an aid to better writing.
Submit a letter