Gordon Commission calls for 'radically different' tests

Website for Education Week — This story also appeared in Education Week

Emerging technology and research on learning have the potential to dramatically improve assessments, if educators and policymakers take a more balanced approach to using them.

That’s the conclusion of two years of analysis by the Gordon Commission on the Future of Assessment in Education, a panel of top education research and policy experts that was launched in 2011 with initial funding from the Educational Testing Service.

In a report that was set for release this week, the commission lays out a 10-year plan for states to develop systems of assessment that go beyond identifying student achievement for accountability purposes and toward improving classroom instruction and giving greater insight into how children learn.

Joanne Weiss, the chief of staff to U.S. Secretary of Education Arne Duncan but not part of the commission, said the report “shines a needed spotlight on the future of assessment, pushing us to make the next stages of this vital work coherent, coordinated, and sustainable.”

“When we get assessment right, it helps families, teachers, schools, and systems tailor learning to students’ needs and make wise decisions,” Ms. Weiss said in a statement. “Today, we stand on the cusp of the biggest advances in assessment in a generation, with assessments that are more useful and less intrusive, thanks in part to advances in education technology.”

At a time when student performance on state tests is used to judge everything from teacher effectiveness to school improvement to a high school senior’s right to a diploma, many in the education world have been pushing hard for better assessments.

Interest in the so-called “next generation” assessments being developed for the Common Core State Standards is so high that last summer visitors crashed the Internet servers of the Partnership for Assessment of Readiness for College and Careers, or PARCC, one of the consortia developing the tests, when it posted sample test items.

Not ‘Revolutionary’

Both PARCC and the Smarter Balanced Assessment consortium are building computer-based testing systems accompanied by benchmarking tools to help guide instruction. However, the Gordon Commission says the common-core tests planned for rollout in the academic year 2014-15, “while significant, will be far from what is ultimately needed for either accountability or classroom instructional-improvement purposes.”

The common-assessment consortia “are trying hard to reform what we currently do, and the commission has been thinking about revolutionary change,” said Edmund W. Gordon, the commission’s chairman and a professor emeritus of psychology at Yale University and Teachers College, Columbia University.

“Assessment has been almost hung up on a commitment to help account for status and to use those assessments of prior achievements to hold individuals and systems accountable,” Mr. Gordon said in an interview.

By contrast, the commission argues that future educators should use systems of aligned assessments, which would inform instruction through a balance of fine-grained classroom diagnostic tests, challenging tasks and projects, and even analytic tools to sift through background data produced by students in the classroom or online.

Such tools would be used in conjunction with larger-grained accountability tests, which are administered less frequently and tend to have too long a turnaround time to be used to help teachers.

For example, middle school students learning to subtract mixed numbers might use several different methods and substeps to solve different types of problems within that unit, and a teacher might give multiple formative tests on the subject. Formative tests are diagnostic tools that measure a student’s growth in an academic area over time. In contrast, summative tests provide a snapshot of student achievement at a specific point and are more commonly used for accountability.

“It makes a lot of sense to check along the way to see where your kids are doing well and getting hung up,” said Robert J. Mislevy, a member of the commission and the chairman in measurement and statistics at the Princeton, N.J.-based ETS, which has helped design the National Assessment of Educational Progress, the SAT, Advanced Placement tests, and other well-known exams.

But in an accountability test, he said, a state education chief may need only a representative sample of students to be given a handful of mixed-number-subtraction problems to get a picture of how well the state’s students understand that area.

“To have 20 or 30 problems for every 5th grader to take—that’s a waste of time,” Mr. Mislevy said.

Assessment Council

Roy Pea, a professor of education and learning sciences at Stanford University, who was not part of the commission, agreed that tests developed for accountability purposes “largely ignore” the need for formative diagnostic tests used to improve instruction.

“There are boundless benefits to endorsing [the commission’s] proposal of transforming assessment to render it for education so as to inform and guide daily progress in learning and development, supporting education’s primary learning and teaching processes with richer pedagogies informed by the learning sciences,” he said in a statement.

The commission calls for states to create a permanent “council on educational assessments,” modeled on the Education Commission of the States and supported with a small tax on sales of tests.

The council would, among other tasks, evaluate the effectiveness of the common-core assessments; help set performance-level benchmarks for cross-state tests; provide professional development for teachers and the public on how to use different tests; and develop and study policies and protocols to protect students’ privacy while allowing the use of assessment data for research.

The Gordon commission also urges that the next iteration of the Elementary and Secondary Education Act—the federal government’s centerpiece education law, currently called the No Child Left Behind Act—encourage states and districts to experiment with new, even “radically different” forms of assessments.

For example, Mr. Mislevy pointed to diagnostic systems now used in computer-based programs such as Carnegie Learning and Khan Academy, in which students work through individual topics at their own pace, taking brief tests of their mastery along the way, with feedback delivered to the student and teacher on individual processes or misconceptions that cause the student problems.

The panel members also advocate developing more tools to collect information as students work through a task in the classroom, in the same way that some programs are beginning to analyze background data generated by students working online.

“It’s assessment, not testing per se,” said Jim Pellegrino, a co-chairman of the commission and a co-director of the Learning Sciences Research Institute at the University of Illinois at Chicago. Rather than trying to build a single test that will cover content and other cognitive competencies, Mr. Pellegrino envisioned, for example, giving teams of students a series of challenging mathematics problems to tackle as a group, and then observing both their ultimate answer and how they collaborate to solve it.

“That’s how you get these other dimensions of competence into the picture, but it’s very difficult to create a single test,” he said. “It’s why a dropped-in-from-the-sky accountability test, no matter how well designed, can’t give you everything you want to know about the competencies of students.”

At the Margins

Mr. Mislevy of the ETS said he believes the biggest assessment breakthroughs will come at the margins, through individual groups like Carnegie and Khan, rather than the “big machine” of the federal and state testing industries.

“And maybe that’s OK,” he said. “Making things happen in the big machine is hard. You need to be more quick, nimble, easy to fail. The big machine doing it all at once is a bad place to try new things and fail at scale.”

The commission acknowledges that its paper does not grapple with several big hurdles in developing more-comprehensive assessment systems, among them the cost of developing complex test items and the widely disparate digital infrastructures of the schools that would use the tests.

This story appears courtesy Education Week. Reproduction is not permitted.