How AI in the classroom could help teachers do a better job

Website for U.S. News & World Report — This story also appeared in U.S. News & World Report

School leaders and education researchers often rely on test scores to judge how well students are learning. But that ignores many important aspects of learning, such as the liveliness of classroom discussion or how engaged and motivated the students are. Expert observers in a classroom can immediately pick up on these unquantifiable moments of great teaching. But human observations are time-consuming and expensive. You can videotape classrooms, but it can be just as tedious and costly to code and analyze the recordings afterward.

Thanks to advances in artificial intelligence, teams of education researchers and computer scientists are experimenting with how to build systems that can “see” into and “listen” in on classrooms and instantly analyze the quality of classroom instruction. The technology is improving rather rapidly now, thanks to improvements in natural language processing and automatic speech recognition. At the same time, it takes an enormous amount of work just to teach the robots how to observe one micro aspect of a classroom at a time.

One team of scientists is focused on a type of question that middle school English teachers ask students. There are all kinds of questions that an English teacher might pose, from the names of characters in a story to what happens next in a plot. Teachers might ask what the students think the author’s main themes are. But Sean Kelly, a sociologist at the University of Pittsburgh School of Education, is most interested in something called “authentic” questions, for which there are no pre-ordained answers. For example, why do you think that Nathaniel Hawthorne chose to give Hester Prynne a female child instead of a male child? Would the sex of the child make a difference in “A Scarlet Letter”? Kelly’s earlier low-tech research found that teachers who ask lots of authentic questions tended to create exciting classrooms full of lively discussions, in which students ultimately learned more.

Kelly and his team of three computer scientists and one English professor succeeded in building a computer model that came close to an expert educator’s ability to discern when a teacher was asking students an “authentic” question. The results of this AI experiment were published June 2018 in Educational Researcher.

“I’m sure if you just let the engineers loose, they could come up with a program that would give teachers all sorts of feedback,” said Kelly. “But would that be the same feedback that a content expert would give? That’s what our research is all about. We were very happy to see reliabilities as high as we did.”

Before you would use this in the real world, you’d want not only more accurate algorithms, but also the ability to discern many more elements of classroom discussions. Kelly dreams of a future when algorithmic observer robots might be able to give student teachers instant feedback to improve their instruction. “You could potentially do it right there at the end of the lesson,” said Kelly. “The beauty of an automated system is that you could potentially collect large amounts of data very efficiently.” Policymakers could use this tool to get a sense of how different curriculum are playing out in classrooms across an entire state.

These AI systems don’t involve robots in the sci-fi or cartoon sense. The researchers began with human observers who were specially trained to watch English classes. These observers sat in hundreds of hours of middle school classes and then spent countless more hours reviewing audio tapes and classifying teacher talk into categories, such as authentic questions. Then researchers employed a machine learning algorithm on the audiotape to figure out which speech patterns were typically associated with authentic questions and which weren’t. Finally, they used these speech patterns to build a computer model that keeps track of the proportion of questions that are authentic.

To check the accuracy of this computer model, the authors compared it to a traditional approach in which actual people listen to a recording of a teacher and identify the authentic questions by hand. The computer model and the hand-coded approach don’t produce identical results. Sometimes the computer labels something an authentic question that a human doesn’t, and vice versa.

On average, the computer estimated a 3.6 percent rate of authentic questions. Humans found slightly more, a 3.9 percent rate of authentic questions. Statistically speaking, the difference wasn’t significant. Deciding whether a question has a pre-ordained answer is a somewhat subjective endeavor. Even expert human observers disagree on how to categorize a question 20 percent of the time.

But even in this relatively successful experiment, the potential pitfalls of relying on AI for teacher observations are in evidence. Is a teacher with a higher authentic question count a better teacher? Kelly says he’s encountered fantastic teachers who don’t utter authentic questions at all or conduct whole group discussions. “They were more concerned with giving students close feedback on their writing,” he said.

Kelly said he wouldn’t want models like this used to penalize a teacher with a low-performance rating. A human observer would obviously give this writing teacher high marks. But Kelly argues that automated observation could give school leaders insight into whether students are getting enough lively, thoughtful discussions in their classes. “If a student never gets any authentic questions and discussions, that’s probably not a good thing,” said Kelly. The average 50-minute middle school English class in America typically has less than two minutes of genuine discussion among several students and a teacher, Kelly said.

Kelly predicts that automated classroom observers will be a reality in the not-too-distant future. “The capability of the computer science is quite good,” he said. “With enough of an effort, we can get there pretty quickly. But if you proceed too fast, you’ll be out there with a tool that you don’t know what it’s doing. You need lots and lots of research to show that it’s valid and leading to useful teacher feedback.” Hopefully, decision makers will use automated observer robots to improve instruction and not as a weapon against teachers, repeating the mistakes made using student test scores to pay and punish teachers.

This story about automated teacher observation was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.