What Common Core tests promised, and what they will actually deliver

Website for McClatchy — This story also appeared in McClatchy

New Common Core tests are debuting on time this spring, but after years of bruising attacks from both left and right, the groups tapped by the federal government to build them are struggling to meet all the hype.

Back in 2010, the plans for the new exams were introduced with much fanfare and many promises: The exams would end the era of dumbed-down multiple-choice tests and the weeks of mindless prepping that precede them. They would force teachers to introduce more critical thinking and in-depth study. They would bring coherency to a mishmash of state tests and for the first time allow states to compare local students to their peers elsewhere in the U.S. And with their online format, the new tests would make testing more efficient, more accurate and more relevant to the digital age.

But a lot has changed since U.S. Secretary of Education Arne Duncan, heralding the arrival of “Assessments 2.0,” promised teachers the tests that many of them had “longed for.”

“One-shot, year-end bubble tests administered on a single day, too often lead to a dummying down of curriculum,” Duncan said of the old state tests. The new exams would test “critical thinking skills and complex student learning.”

The federal government invested $360 million on a grant competition to spur development of the new tests. Two coalitions of states — calling themselves the Partnership for Assessment of Readiness for College and Careers (PARCC) and the Smarter Balanced Assessment Consortium — won grants after agreeing to create tests aligned to the Common Core, a set of grade-level expectations in math and English adopted by over 40 states. States hurried to sign up after the U.S. Department of Education made college- and career-ready exams a condition for some federal funding.

Since then political battles over the Common Core have dampened enthusiasm for the tests. Some have cried foul over how the federal government incentivized the program, calling it federal overreach. Others have complained about how long these tests will take — Smarter Balanced will take eight and a half hours, while some PARCC tests will take over ten hours. Yet more critics have panned the tests because they will be used in some states to evaluate teachers.

This spring, of the original 26 states that signed up for PARCC, just 11 plus Washington, D.C. are giving the test. Of the original 31 signed up for Smarter Balanced, only 18 are still on board. (In the early years, some states were members of both coalitions.) Several of the states will give the PARCC or Smarter Balanced test for one year only, before switching to their own state-based exams next year. Another Common Core exam, known as Aspire, produced by ACT, has stolen away some states from the federally sponsored groups; this spring students in South Carolina and Alabama will take that test.

At the same time, many schools that failed to obtain the (fairly basic) technology necessary for the new exams have been forced to opt for less optimal paper-and-pencil versions in the inaugural year, which has raised concerns that their test scores won’t be comparable with scores from the digital versions.

And despite assurances that drill-and-kill test prep would end, schools are scrambling to get their students ready for tests that still rely heavily on multiple-choice problems. PARCC, in an effort to keep costs down, ended up with more multiple-choice questions than originally planned.

2010: “Today marks the start of Assessments 2.0.”

Frederick Hess, director of Education Policy Studies at the American Enterprise Institute, said the Common Core went “off the rails” after it was “foisted upon the states by federal bribes.” He thinks the public may be disappointed as students sit down to take the tests for the first time.

“None of us have really seen PARCC and Smarter Balanced or the ACT tests for that matter,” said Hess. “We don’t know whether they will be better or worse.”

2015: “We always knew this would be version 1.0 of the next-generation assessments — not the final product.”

Based on the practice questions released by Smarter Balanced and PARCC, others in education policy say they’re confident that these tests will be a marked improvement over the old tests.

A RAND study published in 2012 looked at how well 17 of the old state tests gauged “higher-order skills,” such as abstract thinking skills and the ability to draw inferences from multiple sources. RAND concluded that only 2 percent of math questions and 21 percent of English questions were higher-order. Multiple-choice questions were the worst offenders, according to the researchers, who didn’t find a single higher-order multiple-choice math question.

Experts say the multiple-choice questions on the PARCC and Smarter Balanced tests are better than their counterparts on the old tests. They say these multiple-choice questions are designed to test critical thinking and problem-solving skills.

“In the old tests a student would just get a vocabulary word by itself and would be asked to find a synonym,” said Andrew Latham, director of Assessment & Standards Development Services at WestEd, a nonprofit that worked with Smarter Balanced and PARCC on the new tests. “Now you will get that word in a sentence. Students will have to read the sentence and be able to find the right answers through context clues. And yes, I said answers plural, there might be more than one correct answer.”

In 2013, researchers at the National Center for Research on Evaluation, Standards, and Student Testing looked at the test plans put out by both groups. Based on those, the center estimated that 68 percent of consortia English questions and 70 percent of math questions were higher-order.

Hess questions whether the categories used by researchers to judge the tests are accurate reflections of the tests’ quality.

“The tricky thing with that kind of framing is that it hides as much as it reveals,” said Hess. “It rushes pass the really thorny stuff. Not all higher-order questions are made alike. I want really good questions that are higher-order.”

Others say that regardless of how you look at these tests, you can’t deny that they are more difficult than their predecessors.

“As an adult reader you can look at items on the fifth- and sixth-grade tests and be challenged yourself,” said Robert Pondiscio, senior fellow and vice president for External Affairs at the Thomas B. Fordham Institute, a conservative think tank. “That didn’t happen often in the past.”

The U.S. Department of Education agrees.

“We’re pleased that the new tests usher in the next generation of assessments — they’re better than the old bubble tests they replace and focus on assessing the critical thinking and problem-solving skills that students need for success in college, careers and life,” said Dorie Nolt, the department’s press secretary.

The biggest difference, many say, are the “performance tasks.” On the English tests, these sections ask students to write using evidence from the texts. The math performance tasks consist of multistep problems that are designed to require strategic thinking.

“They are going to be more time-consuming and focus much more sharply on reasoning in math and things like finding evidence in English,” said Derek Briggs, professor and program chair at the Research & Evaluation Methodology (REM) program at the University of Colorado Boulder.

In 2010, 44 states were signed up for PARCC or Smarter Balanced tests, but only 29 states will be giving them.

Students in 21 states got a preview of the performance tasks during the Smarter Balanced field test, a practice run for the new tests held last spring.

“From the field test, we know we are not going to see a lot of students doing really well on these performance tasks,” said Briggs, who advised both Smarter Balanced and PARCC on the design of the new tests. “The scores for the multiple-choice questions were on the low side of what you saw on the old state tests, but on the performance tasks we mostly saw 1s and 2s on the four-point rubric.”

One of the perks of performance tasks, some say, is that they are less vulnerable to test prepping.

“Anything you do will provoke test prep, but in order to answer these questions correctly you need to know the standards deeper, you can’t get this through drill-and-kill test prep,” said Linda Darling-Hammond, a professor at Stanford’s Graduate School of Education and senior research advisor to Smarter Balanced. “It will require teaching that is more analytical. Schools that were able to succeed with drill-and-kill in the past won’t be able to anymore.”

Performance tasks aren’t necessarily a panacea, however. These sections rely heavily on open-ended questions — meaning that they will take students more time at a moment when a growing anti-testing movement is calling for reducing the amount of time students spend testing. And they’re more complicated — and more expensive — to grade.

“They take longer and we get less test information per unit time spent,“ said Scott Marion, associate director of the National Center for the Improvement of Educational Assessment and an advisor to PARCC. “But there is just no way not to have them. You have to see the evidence that students can do the things we care about. Students can fake it on multiple-choice questions. That’s harder to do on the performance tasks.”

The fact that grading these questions will take more time is in conflict with another Duncan promise — that these tests will provide more timely feedback for teachers than other tests provide.

Duncan said of the old tests that “they generally provide time-sensitive data and results months later, when their instructional usefulness has expired.”

Briggs said the consortia will have to do groundbreaking work in the area of computer scoring to change this.

“For feedback to be more immediate you either have to have tremendous resources, which is probably not likely to happen,” said Briggs, “or [computer] scoring will have to get much better.”

“If the consortia survive until 2018, I think we are going to have some really great tests.”

While the technology may not have advanced far enough for computer scoring, both consortia have had some success in getting more schools to trade in their No. 2 pencils for electronic devices.

With the new computer-based tests, students will have access to highlighters and calculators. They will watch videos and be asked to write a response to them and other sources.

Smarter Balanced in particular made a big bet on technology. The exam is computer-adaptive, meaning that when a student answers a question correctly, the next problem is more difficult. If they answer the question incorrectly, the next question is easier. The idea is that this system more precisely gauges students’ true level of knowledge.

But problems with technology in districts across the country have limited the rollout of these features. Smarter Balanced estimates that between 10 and 20 percent of its seven million students will have to use paper tests. PARCC told the Associated Press that it expects about 25 percent of its five million students will take paper tests. Although several states that are not using either test, Florida and Utah for example, will be giving online exams this year, it’s clear that millions of students will still be taking paper-and-pencil tests.

Nolt, the press secretary for the U.S. Department of Education, expects this to change as time goes by.

“This year, millions of students are taking technology-based assessments in nearly every state,” said Nolt. “Next year, millions more will. This year is a key first step toward the ultimate goal of better tests that more accurately capture what students are learning.”

In February, three states — Wisconsin, Michigan and Missouri — announced they wouldn’t be able to use the computer-adaptive feature of Smarter Balanced due to technical issues, raising questions about how comparable scores in these states will be with those in other Smarter Balanced states.

“If you can pull it off, sure, it’s better to have a computer-adaptive test, but a lot of things have to go right for that to work,” said Briggs.

The biggest challenge is that it isn’t clear which questions are most difficult. “In many parts of the country, the curriculum isn’t in alignment with the tests yet, so the relative difficulty of each question might change,” said Briggs. “You need to know what is a harder question and what is an easy question. That is unclear right now, which could make the algorithm shaky.”

Tony Alpert, executive director of Smarter Balanced, said that they are ready to go live, because they established the difficulty of the questions with last spring’s field tests.

With PARCC’s online tests, as with standard pencil-and-paper tests, the questions a student receives are preset, a model that some critics say can’t ascertain the abilities of low- and high-achieving students as well as it measures students who are near their appropriate grade level.

Laura Slover, CEO of PARCC, said that its tests will “effectively” gauge all students’ knowledge of the standards.

“Students deserve to be tested on the same content,” added Slover. “If all students are given the same test, there is incentive for all students to be taught the full range and complexity of the standards. The test sends a signal.”

In 2010, Duncan also called PARCC and Smarter Balanced “comprehensive assessment systems,” because in addition to the end-of-year exams, the consortia were to develop mid-year tests designed to give teachers a sense of how well their students were mastering the standards along the way.

“For the first time, teachers will consistently have timely, high-quality assessments that are instructionally useful and document student growth,” said Duncan. “Rather than just relying on after-the-fact, year-end tests used for accountability purposes.”

“As an adult reader you can look at items on the fifth- and sixth-grade tests and be challenged yourself. That didn’t happen often in the past.”

But with anti-testing protests mounting, four Smarter Balanced states decided not to administer the additional tests. PARCC is introducing their extra exams in the fall.

While Duncan celebrated the differences in design between the consortia, he also championed elements that would unify states — every state in each consortium would adopt the same “cut scores,” the number of questions a student has to correctly answer to be deemed proficient, making it possible for the first time to compare students to their peers in other states.

Using data from last spring’s field test, Smarter Balanced voted on cut scores in November. New Hampshire and Vermont abstained from the vote due to concerns over the validity of the process. PARCC is waiting until this summer to approve cut scores based on this spring’s tests.

After being approved by the consortia, cut scores then have to be adopted by each state individually — usually by the state Board of Education. Not all Smarter Balanced states have adopted the cut scores yet, though none have voted them down. PARCC states will have to approve cut scores this summer.

Luci Willits, Smarter Balanced’s deputy executive director, doesn’t expect states to adopt lower cut scores than the ones the consortia recommend because doing so could put federal funding in jeopardy.

In addition to failing to fully deliver on earlier promises of digital tests for all students and more timely results, as with the national benchmarks (through agreed cut scores), other promises made by the Department of Education back in 2010 are still up in the air.

For example, it is still unclear how many colleges and universities will sign on to use PARCC and Smarter Balanced tests as a means for deciding whether a student should be placed in remedial courses. So far, university systems in West Virginia, California and Washington have said that they will.

With so much still in flux, experts say time is what the tests need to get better.

“Time has always been our biggest resource constraint,” acknowledged Alpert of Smarter Balanced.

But Alpert said they are proud of what they have achieved. “We’ve field-tested and calibrated over 20,000 questions and a good many of those questions will appear this year. And those questions have more supports for English Language Learners and students with disabilities than any state has ever had.”

While they are happy with the product, the Department of Education hopes to see tests get even better.

“Even states not using PARCC or Smarter Balanced are building new, better tests and will continue to improve those tests over time,” said Nolt. “We always knew this would be version 1.0 of the next-generation assessments — not the final product.”

Marion of the National Center for the Improvement of Educational Assessment says that a lot has been accomplished in little time.

“I think the timeline was really short,” said Marion. “The [new tests] will be better than almost any state [test], and by a long shot, but they are not going to be exactly what was promised. If the consortia survive until 2018, I think we are going to have some really great tests.”

This story was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Read more about Common Core.