DENVER – Gathering in cavernous meeting rooms in the basement of a Denver hotel, over 100 educators from 10 states and the District of Columbia met in late July to figure out what determines a passing grade on tough, new Common Core-aligned tests.
Millions of students nationwide, many trying computer-based exams for the first time, took new tests this spring, and their hundreds of millions of answers have been scored and recorded. But now, educators – representing the states that administered new Partnership for Assessment of Readiness for College and Careers (PARCC) tests in the spring – have to decide how many questions a student needs to get right to reach each of the five performance levels, with Level 3 and above signifying that students are on track to be college and career ready.
The federal government tapped two groups of states – PARCC and Smarter Balanced – to develop tests aligned to the Common Core, a set of educational standards for math and English, initially adopted by all but four states.
These tests promised to be a reality check for states that had watered down exams, intending to hold students in Mississippi to the same standards as students in Manhattan. They would provide a clear and honest picture of whether students were ready for college or the workforce. The teachers sitting in that hotel basement in Denver have been entrusted with the power to determine just how shocking that reality check is, and what they ultimately decide could have consequences across the country. That’s not a responsibility they take lightly.
“How my students are going to do isn’t important to me in this process,” said Marti Shirley, a panelist and a high school math teacher in Mattoon, Illinois. “It might be a tough test, but it’s going to be a measure of what they should be able to do under the standards.”
As the new, tougher tests collide with high-stakes testing policies, policymakers have to balance promised rigor with politics. The educators in Denver – each nominated by one of the states that took PARCC tests this spring – are tasked with setting a high bar regardless of the consequences for students, teachers and schools who may be held accountable for the results. After that work is done, officials in the states have to ensure that the higher bar doesn’t unfairly penalize schools that are still in the process of adapting to the Common Core.
This difficult balancing act is in part due to the nature of these tests. The new Common Core aligned tests are different than say the SAT or ACT. SAT scores are based on where a student falls on the distribution of all students; no matter how tough the test is, some students will get a perfect score. With annual state tests like PARCC, there is no such guarantee. These tests are only concerned with how well the students have learned the standards.
The Denver meetings start with two days of training on the Common Core standards and on detailed outlines of what students at each of the five performance levels should be able to do in a given grade or subject.
After the training, each panelist goes through the test questions and decides how many points a student at each level should have scored on each question. For a three-point question, a panelist may decide that Level 3 students would have gotten two of those points while students at a Level 4 or 5 would have gotten the entire question correct. The cut score comes from adding up the points from all of the questions.
The meeting in Denver was the first in a series of events where educators work in panels to set cut scores. The panels working on the high school tests met in late July, similar groups are meeting in August to discuss the elementary and middle school tests. Each panel is tasked with coming up with the cut score for a single test, say the Algebra I test.
After the panelists have come up with a cut score for the test they are working, they get a reality check. They see how their peers have tallied up the points and they get impact data, that is, what the results would look like if the cut scores were set where they suggested.
Research has shown that in the past, states lowered cut scores in an effort to show that their students were improving. Some experts worry that giving panelists impact data might interfere with their responsibility to only consider what students ought to know when setting the cut scores. This, some acknowledge could open the new tests up to the same political pressures that drove states to create easier tests.
“If you want to say this is about college and career readiness, it’s okay to say that no one has it,” said Gregory Cizek, professor of educational measurement and evaluation at the school of education at the University of North Carolina. “But on the other hand failing to tell the panelists about the consequences of their actions is irresponsible. I realize both contradict each other. I want them to make their decisions based only on content, but also I want them to know the consequences.”
With the theoretical results in hand, the panelists discuss why they made the decisions that they did. They then go through the same process of individually looking at the test two more times – with an additional debriefing session between the second and third rounds. At the end of the process, the cut scores are aggregated.
While PARCC can control the influence of politics during the cut score setting process, decisions about how to use the scores are entirely up to policymakers in each state. Smarter Balanced, the other consortia of states that developed states, already set its cut scores. As those states began releasing results, officials are finding ways to cope with the consequences of setting a higher bar.
Take the recent debate at the Washington State Board of Education, which set the score required to graduate from high school between Level 2 and Level 3 out of four performance levels. While Smarter Balanced says a score of 3 or 4 corresponds to college and career readiness, the score Washington picked for students to graduate, which officials say is a temporary fix, ensures that the same amount of students pass the new tests as did on the old tests.
Luci Willits, deputy executive director of Smarter Balanced, says that this doesn’t go against the idea that under the new tests, all states would be forced to meet the same higher standards.
“It’s Washington’s decision and theirs alone and doesn’t have an effect on the work of the consortia,” she said.
Willits added that it is important to note that in the past many states have phased in graduation requirements over time and that Washington’s requirements could indeed eventually come closer to the level that Smarter Balanced says translates into a student being ready for college or career.
Deven Carlson, a political science professor at the University of Oklahoma, says that this kind of tension isn’t unique to Washington’s graduation requirements.
“There’s been difficulties in implementing the Common Core over a whole host of existing educational policies for holding teachers, students and schools accountable,” said Carlson. “How do you integrate an overhaul of standards and assessments in systems that depend so much on standards and assessments. That’s an open question that wasn’t given enough forethought in the planning stages.”
Cizek says that the stakes seem particularly high for many policymakers because of the more grandiose claims about what new Common Core tests are measuring.
“The previous generations of tests made more modest claims about things that we could easily validate,” said Cizek. “It was just about measuring the curriculum, something those tests more or less did pretty well. We didn’t make the claim that how a student did on those tests would predict the likelihood of them being ready for college or a career.”