Carnegie Mellon project revives failed inBloom dream for student data

Website for U.S. News & World Report — This story also appeared in U.S. News & World Report

Ken Koedinger, project leader of LearnSphere, and a professor of human computer interaction and psychology at Carnegie Mellon University (Photo: Carnegie Mellon University website)

LearnSphere, a new $5 million federally-funded project at Carnegie Mellon University, aims to become “the biggest open repository of education data” in the world, according to the project leader, Ken Koedinger.

If you think that sounds ambitious and a lot like inBloom, the Gates Foundation-funded non-profit that shut its doors in 2014 after student privacy fears escalated, you’re right.

“There certainly are some similarities,” said Koedinger, a professor of human computer interaction and psychology at Carnegie Mellon University.

An internationally renowned leader in the field of education technology, Koedinger’s known for developing the mathematical models that drive “cognitive tutors,” which tailor instruction to individual students. He’s also a co-founder of Carnegie Learning, an educational software business that was spun off from Carnegie Mellon in 1998, and whose software is currently used by 400,000 students.

Koedinger launched LearnSphere earlier this year with the hope of making it easier and faster for researchers to analyze big datasets — mostly student keyboard clicks — in order to test educational theories and boost learning outcomes from elementary school to college. Just as inBloom had hoped software makers and researchers would use its vast database to improve education technology, Koedinger also wants to create a forum for sharing and analyzing data on how students learn. But, he says, there are important differences between LearnSphere and inBloom.

For one, he says he’s not going to allow any personal information from school records in LearnSphere.

“In some ways, it’s a deep philosophical difference,” Koedinger said. “We are not looking that much at collecting demographic data and certainly not any kind of record information. Those are the things that tend to be particularly sensitive.”

No student names, no addresses, no zip codes, no social security numbers, he says. No race, family income or special education designations. “The student identifier column, even if yours is already anonymized, we re-anonymize it automatically,” he added.

There may be demographic information on a school — for example, the percentage of students who qualify for free or reduced-price lunches. But Koedinger says that even the school name is anonymized in most cases.

Unlike inBloom, which wanted public school districts to use its servers to store student information, Koedinger has no plans to store school records and doesn’t anticipate that school officials will upload anything to his virtual warehouse of data. Instead, he wants education researchers and software developers to upload their data. This is the data of keyboard clicks as students are using educational software, the millions of keystrokes they make as they answer questions, hit backspace or sit idly daydreaming and uninterested.

This new university-driven data repository builds off of earlier data projects at Carnegie Mellon, Stanford and the Massachusetts Institute of Technology, all of which are partners in LearnSphere. The University of Memphis has joined as a team member, too.

Koedinger’s team isn’t building a physical warehouse in one single location. Those who want to share data can upload it to one of the sites that LearnSphere is managing, or they can keep it on their own server and control who gets access to it. The goal is to build something called a “distributed infrastructure,” which allows researchers access to data on someone else’s computer. The hard work for Koedinger’s team is in cleaning up the data so that outside researchers can analyze it easily.

Regardless of where the data is stored, Koedinger says his research manager will go through the data with a checklist to make sure no information that could identify a student is attached to data that is being shared. And, he says, this manager will continue to monitor data for improper additions.

The ultimate goal is to translate research questions into computer commands that can be run on any dataset. For example, how many times does a student need to repeat or practice something before it becomes knowledge? Or when is the optimal time to give feedback, right away or after a bit?

At the moment, Koedinger is working on creating an example of the kind of research project he would like to see housed by LearnSphere. He recently studied how much students learned when they were taking a free online course, a MOOC, in introductory psychology. He asked what increased student learning the most: videos, reading assignments or online interactive tasks?

“Most instructors are spending their time on videos. But our model suggests, for every activity you do, you get six times the bump than for every video you watch,” said Koedinger. “Maybe someone will say, ‘I don’t believe it for my course, I think the videos are more valuable.’ Let’s see for yourself with your own data and see what you get.“

Koedinger hopes that with a simple press of a button, researchers can rerun that same question on a different course without spending months collecting and cleaning the data. He supposes it’ll take a “year or so” before that’s a reality.

This article also appeared here.