ARLINGTON, Mass.—As rain poured outside on the chilly evening of February 24, a group of Arlington elementary school parents was imagining a sunnier place — Dorothy’s trip down the yellow brick road.
“Today you will read and think about the passages ‘Rescue the Tin Woodman’ and ‘Arriving at Emerald City’ from The Wonderful Wizard of Oz,” read a PowerPoint slide projected in the Thompson School gym, where the group had gathered. Linda Hanson, an Arlington School District literacy coach, was taking parents behind the curtain of a new standardized test their children would face April through June.
The exam, known as PARCC — which stands for Partnership for Assessment of Readiness for College and Careers — was aligned to the Common Core, a set of national educational standards for what students should be able to do in each grade in English and math. Hanson warned the parents that their children should expect different — and probably more difficult — questions and writing prompts than they had seen on the Massachusetts Comprehensive Assessment System test, or MCAS.
Take the sample writing prompt on The Wonderful Wizard of Oz, designed for fourth-graders. For any parents tempted into pleasant reminiscing about Munchkins and the Wicked Witch, the reverie didn’t last long. The prompt was complex: “Based on her words and actions in both passages, describe two of Dorothy’s qualities. Think about the person that Dorothy is. How do those qualities affect her adventures? Support your response with details from both passages.”
A woman in the back row whispered to her neighbor: “That seems hard.”
In the decade and a half since the national No Child Left Behind Act made annual, mandatory testing the new normal, protests have grown. Teachers and parents around the country have questioned whether any “cookie cutter” test can capture how much an individual student knows. The MCAS has long been considered one of the nation’s best tests at assessing student performance. But the shift to the Common Core State Standards meant it would have to go.
The PAARC tests, used in states such as Illinois and New Jersey since 2015, were supposed to be even better. Not the joy-killing machines ruining childhood, as so many critics have portrayed standardized tests, but true measures of whether children were learning the key skills they would need as grown-ups: how to think critically, solve problems, make a convincing argument, and write a coherent paragraph.
Instead, the uproar over testing has only gotten louder. The increased difficulty of PARCC and other Common Core-aligned exams sent pass rates plummeting, while teacher evaluations linked to scores have fueled union-led fights, including those now unfolding in Massachusetts. And the continued use of multiple-choice questions has parents, teachers, and kids questioning whether the new tests could be much better than what they were replacing.
Amid the controversy, the Massachusetts Board of Education decided last fall to create an MCAS/PARCC hybrid unique to this state. Officials and educators are optimistic that by retaining control over the test, they will help preserve Massachusetts’s spot at the top of the US educational pack.
Meanwhile, many parents and educators are hoping the state will take into consideration another important question: What is a good test? The state surely can’t devise a test students enjoy taking. But can it design one that, rather than dictating what students learn, captures what they know in a fair way? “I really want to see us get away from teaching for a test and letting the test support the educators’ goals in teaching our children,” says Angelina Camacho, a parent of a second-grader at a Boston public school. “The test should be illuminating the actual capacity of our students.”
Experts talk about testing in scientific terms: A test needs to be “valid and reliable” and “discriminate” among different levels of proficiency. While these words have precise definitions in the field of psychometrics, the experts essentially want the same thing parents and teachers want: a math test that doesn’t measure students’ reading comprehension but whether they can add fractions; an English test that doesn’t measure what students know about the Revolutionary War but how well they can use sample text to support an argument.
It’s a simple goal to talk about but far harder to pull off.
On the surface, MCAS looks a lot like your typical state exam: a pencil-and-paper test, made up mostly of multiple-choice questions and some open-ended ones. But because Massachusetts had some of the most highly regarded standards in the country and the test was closely aligned to them, it earned a reputation as a bright spot in the testing world soon after it debuted in 1998. Massachusetts became a leader in national assessments in math, reading, and science.
But the national accolades for the MCAS didn’t mean everyone embraced it. As with other standardized tests, MCAS critics said it both pressured teachers to teach to the test and caused students undue stress — particularly the high schoolers who must pass the 10th-grade exam to graduate. A few teachers I spoke to said it wasn’t uncommon for them to disagree with the test results for individual students, although others said they trusted the scores. I heard similar things from students. “It’s a good representation of what you see in the classroom,” says Jodalis Gonzalez, a senior at Boston Prep, a charter school in Hyde Park. “That’s one thing I really appreciated.”
Mitchell Chester, commissioner of elementary and secondary education, says that MCAS, for the most part, has served the state well but that the time to make a change has come, even if Common Core hadn’t hastened the decision. The test has been given 19 times, Chester says. “Like anything that is going to be first-class, you need to upgrade from time to time.” PARCC was supposed to represent that upgrade in Massachusetts.
The first way PARCC differs from MCAS is that it’s designed to be given on a computer, although schools are allowed to use a paper-based version while they improve their technology. Computers can potentially help assess knowledge and skills in a variety of ways that would be more difficult to score on a paper test. A PARCC math question, for instance, may require students to first create an equation to prove they understand how to solve the problem, then type in the correct answer. A multiple-choice question might have more than one answer, to see if students can identify various synonyms of a word or equivalent fractions. For a deeper check of reading comprehension, students might be required to drag and drop events in a story in the correct order.
Common Core demands students go beyond rote memorization and demonstrate critical-thinking skills. Underscoring this goal, PARCC uses performance tasks, open-ended questions that require students to work through multi-step, realistic problems. One performance task, for instance, asks third-graders to read two articles about the Arctic and then write a letter using “ideas and facts from both articles” to persuade a friend that people and animals can live there.
(In terms of costs, performance tasks are more expensive to score than multiple-choice exams. But a 2015 state analysis found that, on the whole, PARCC costs $32 per student on average — $10 less than MCAS. And though the cost for PARCC was expected to rise, the report found “there is no clear conclusion that either assessment program is more or less expensive than the other.”)
Many educators in Massachusetts and elsewhere, however, have said that while the content of the new Common Core tests may demand more of their students, the technology enhancements are often just window dressing on items that could as easily follow a simpler multiple-choice format. Others worry the content itself is too challenging. For example, one Arlington third-grade teacher says she could imagine her students needing to reread a passage six times to find the two snippets of text that lead them to the correct definition of “teeming.” And there are still problems with taking the exam on a computer: Students can only see a few lines of text at a time while writing an essay, for instance, making it hard to effectively edit their work.
Communities across the nation have struggled with PARCC. Seventeen out of 26 states that initially committed to using the test in 2010 have since dropped out — some without ever trying it out — and an 18th state, Louisiana, is only using part of PARCC. In the spring of 2015, tens of thousands of students in New Jersey and other states opted out of taking the tests altogether. Technological glitches also meant some schools had to halt testing in the middle of exams.
In Massachusetts, the reception has so far been mixed. The state’s plan had long been for a slow transition to the PARCC, and in 2015, each district was able to choose whether to give the new test or stick to MCAS. In Boston, Worcester, and Springfield, individual schools within each district were given the choice. (The decision to offer both came at a cost, with the state needing to raise its annual assessment budget from $32 million to $37 million).
Some parents and educators felt strongly that the new test would push students to think more deeply, a view shared by Chester, who also served as chairman of the PARCC consortium’s governing board and was a driving force in bringing the test to Massachusetts. But a vocal contingent from the roughly half of school districts that had elected to take the PARCC reported poor experiences in their first run, either with technological snafus or the content itself. Education experts around the country began to wonder how big a blow Massachusetts would serve to the already beleaguered consortium if it left.
Still, the state Board of Education would need to make a final choice between MCAS and PARCC in November. As the deadline approached, the consortium made a game-changing announcement: Rather than forcing member states to use the whole test, it would let them use individual questions a la carte. Faced with answering a multiple-choice question by (A) keeping MCAS or (B) fully committing to PARCC, the board went with the new option (C). In what some describe as a political decision meant to appease PARCC critics, the board decided the new Massachusetts test would be an amalgam of the MCAS, PARCC, and yet-to-be-written items. Chester, who was the first to suggest this compromise, calls the new test MCAS 2.0.
The 2.0 version won’t begin until 2017, so Massachusetts school districts again took either the PARCC or MCAS in grades 3 through 8 this year (the assessment budget was again bumped up by $5 million).
Boston-area educators have heard rumors that the hybrid exam will closely resemble PARCC in the end — but Chester says that largely depends on the feedback the state gets from teachers and principals. “If they tell us that very little of what’s been developed on PARCC is helpful and relevant, then we’ll use very little,” he says. “If they tell us a lot is aligned, we’ll use a lot.”
As MCAS 2.0 evolves, the state has created 13 committees and work groups to provide insight from teachers and others. One committee is taking another look at how the Common Core is incorporated into Massachusetts standards, to see if that needs any refinement. A test administration team will debate how many test sessions there should be and whether the tests should be timed (as PARCC is) or untimed (as MCAS has been). The Digital Learning Advisory Council, an existing group, will discuss how districts are progressing toward the state goal of total online testing by 2019.
Linda Hanson, the Arlington literacy coach, applied to be on a committee but wasn’t selected. She was given a survey to fill out with her thoughts, however. “Teachers are the ones who really can see how students react in the moment to the task that’s put in front of them,” she says. “Why would you want to produce something that doesn’t have strong teacher input?”
If Barbara Madeloni had her way, the state would abandon standardized testing altogether. She’s the president of the Massachusetts Teachers Association, a union that officially acknowledges a limited role for testing. But speaking from personal experience as a veteran teacher, Madeloni rejects the premise that such a thing as a “good” one-size-fits-all test even exists.
“A question about what does quality assessment look like for teaching and learning is really a different question,” she says. “To enter into a conversation about what’s the better version of something bad. … Why don’t we allow ourselves to imagine the really good thing that we want?”
She would like our teachers to be able to develop what she calls “authentic” assessments — tasks that present students with problems they might see in the real world. Unlike the performance tasks of PARCC and other exams that take place over a relatively short time and in isolation, Madeloni says an authentic assessment would give students time and resources to solve problems, which might mean working in a group or doing outside research. “Standardization is not an effective way to do that,” she says. “It doesn’t tell us anything really meaningful.”
Arguments like that drive Ronald Hambleton crazy. He’s the executive director of the Center for Educational Assessment at the University of Massachusetts Amherst and has served on committees for both PARCC and MCAS. He says that for the relatively small amount of time and money that is spent on testing, the returns are hugely valuable. “You need the assessment piece to provide feedback about strengths and weaknesses,” he says. “School reports, district reports, state reports — a tremendous amount of information is available.”
While Madeloni argues that everyone already knew that achievement gaps exist between Massachusetts’s wealthy suburbs and its poor cities, others maintain that having concrete proof of those and other gaps is essential. That’s part of the reason No Child Left Behind mandated annual testing reported by subgroups such as race and socioeconomic status in the first place. And it’s why the Every Student Succeeds Act — the new version of No Child Left Behind passed in December — pointedly lowered the stakes placed on testing but didn’t repeal testing altogether.
“Without having a common metric with a high bar . . . we don’t have the information to highlight, to spotlight those places that are excelling and those places that are lagging,” Chester says. “And that’s absolutely essential to an equity agenda.”
Hambleton agrees with Madeloni that student assessment scores shouldn’t be used to evaluate teachers but says they offer an important perspective on how students are doing. If teachers feel pressure to teach to the test, he says, administration policies are to blame and not the exam. His message to them? “Don’t do that garbage.”
“MCAS is simply assessing the curriculum,” he says. “[Students] should be learning the curriculum.”
Most educators interviewed for this article said they saw value, albeit in varying degrees, to the state assessment. But they — as well as the education commissioner and test designers — all agree: A state test is just a piece of the puzzle.
That’s what Richard Bransfield believes, too. When he became principal of Malden’s Linden S.T.E.A.M. (science, technology, engineering, arts, and math) Academy in 2012, one of the first things he did was to start laying out plans for a room devoted to student data. Over his nearly three decades in the district, he had watched as issues like poverty and mental health grew in importance, and he wanted to create a space where teachers could meet regularly to talk about student progress.
Visualizing that progress was key. In the data room, each of around 900 students in the school has a small card that tracks his or her grades and scores on MCAS and other assessments. Student cards, broken down by grade level, are sorted into green, yellow, and red plastic holders on the wall by whether they are meeting standards, are just shy of proficiency, or are falling well off target. Each card has a picture of that student to remind staff that they’re talking about real kids, not data points.
Bransfield recognizes state testing data are important, but, in his school, so are a lot of other measures, such as grades or scores on a reading fluency assessment. “If any school is waiting for MCAS or PARCC to see how they’re doing, they’re in trouble,” he says. No matter what MCAS 2.0 looks like, Bransfield says, it won’t be able to measure everything.
Take a seventh-grade English class at Linden. Over the winter, small groups of students each picked a different novel to read and discuss. They were then asked to identify a problem from the book and engineer something that would solve it.
One group designed a splint for a character with a twisted leg. Another made a “stylish” and waterproof headband to help hold in a character’s hearing aid.
Another group read Story Time, a satirical novel about a school obsessed with standardized testing (in one passage, an English teacher is fired for hanging up a Shakespeare poster because Shakespeare isn’t on the test). For a protagonist being sexually harassed, the students made a self-defense device: a glove with batteries and wires that created a spark when touched together. On class presentation day, one group member walked around the room showing off their working prototype. Classmates leaned in close to get a better look.
Tests can’t capture that kind of engagement, but getting students excited about learning is a critical element of a successful school. “If people are looking for the perfect test, there’s never going to be a perfect test,” Bransfield says. “If we can get kids to be thinkers, engaged, literate — the test will take care of itself.”