The search for a new way to test kids

By Greg Toppo, USA TODAY

student testing — Investigators found “clear, statistical evidence” of cheating on state tests at George Washington Elementary, a Blue Ribbon School, in Baltimore. (Photo by By Greg Toppo, USA TODAY)

By all accounts, George Washington Elementary School is the very model of a modern urban public school.

Investigators found “clear, statistical evidence” of cheating on state tests at George Washington Elementary, a Blue Ribbon School, in Baltimore.

Tucked into an up-and-coming neighborhood west of downtown, the school has produced impressive results on annual Maryland School Assessment (MSA) math and reading tests over the past several years. By 2007, scores had improved so steadily that the U.S. Department of Education made it a National Blue Ribbon School of Excellence. First lady Laura Bush came to town to hand out the award.

But in October 2008, a parent came forward with a troubling complaint: Someone was tampering with answer bubble sheets at Washington Elementary.

Soon, Baltimore Schools CEO Andres A. Alonso showed up at a PTA meeting at Washington and found “very poor” parent turnout and “an absence of student or staff enthusiasm,” according to city records.

He ordered his staff to monitor the 2009 tests and brought in state investigators, who dug out 2008 test booklets from a California warehouse. After a year-long probe, investigators found “clear, statistical evidence” of cheating on Maryland’s tests at the school, according to the official report. Someone with access to the tests “changed multiple answers” from incorrect to correct for third-, fourth- and fifth-graders, in both reading and math, Alonso announced at a May 2010 press conference here.

Suspicious test scores series

The Hechinger Report, USA TODAY and several other news outlets partnered to investigate the standardized test scores of millions of students in six states and the District of Columbia. The investigation identified 1,610 examples of statistically rare, perhaps suspect, gains on state tests.

MORE COVERAGE:
• When test scores are too good to be true
• For teachers, many ways and reasons to cheat on tests

Alonso did not say who was at fault but, at his request, Maryland revoked the teaching certificate of Susan Burgess, the school’s principal. USA TODAY could not reach her for comment, but she told The Baltimore Sun that she hadn’t done anything wrong. Alonso said that, given the extent of the cheating, Burgess should have known about it.

When Alonso announced the results of the probe, he didn’t hold back. “I am disappointed, I am sad, and I am angry about what happened at George Washington Elementary School,” he said in a statement. “I am disappointed because adults know better. I am sad because, in this case, the adult(s) involved clearly did not believe in our kids. And I am angry because too many people are working too hard to move our students forward, and this is an affront to that incredibly hard work.”

Washington Elementary is still listed in federal documents as a Blue Ribbon School.

This week, as students throughout Maryland take the 2011 tests, more than 150 monitors, mostly retired teachers, are watching test-takers and school officials in Baltimore. That will cost about $320,000, enough to pay the annual salaries of nearly eight first-year teachers.

Parents and educators worry

Although the kinds of troubles that snared Washington Elementary are rare, these and other glitches happen often enough that they are prompting educators nationwide to reconsider the basic principles of the USA’s massive school testing infrastructure, a $1.1 billion system that increasingly steers the ship of public schools.

Parents and educators worry not only about cheating incidents but also about expanding test prep regimens, anemic and narrowed school curriculums and, most recently, about what they consider unfair teacher evaluations based on student test scores. Here as elsewhere, student scores on standardized tests are closely tied to teacher pay and retention.

Nearly 10 years after Congress approved the watershed No Child Left Behind (NCLB) law, which enshrined annual reading and math tests for millions of U.S. public school students, practically no one is happy with the state of educational testing in America. Just about everybody, from Secretary of Education Arne Duncan on down, is searching for something new, something more secure or simply something that makes more sense.

Duncan last September said he has visited 42 states and that nearly everywhere he went, teachers, parents, principals and lawmakers complained that what’s taught in school is narrowing as more teachers focus on improving scores in standardized tests, especially in schools with large numbers of disadvantaged students.

“Schools may give lots of tests — often too many,” Duncan said, “but the assessments aren’t always testing important knowledge and skills.”

In response to cheating, many states and school districts are tightening test security. In Texas, where math and reading tests became standard fare a decade before President George W. Bush took the idea nationwide with NCLB, there’s been a crackdown. In 2006, state officials responded to allegations of widespread test tampering, first reported by The Dallas Morning News, by hiring Utah-based Caveon Test Security to investigate test data. Caveon flagged about 700 schools for irregularities.

All but a handful of cases turned out to be minor, but the state stepped up training and since 2007 has required schools to distribute a 14-point test security plan to staff. It emphatically lays out what steps schools must follow during test administration and warns that state investigators will ferret out cheaters. The anti-cheating arsenal includes computerized bubble-sheet and erasure analyses, surprise visits to schools and signed security oaths.

The 14 points “really cause educators to stop and think two, three, four times before they’re willing to risk their certification,” says Criss Cloudt, the state’s associate commissioner for accountability. The plan has put a spotlight on cheating and led to more reports from suspicious parents and teachers — up from 3,954 in 2008 to 4,462 in 2010, according to state records — but Cloudt can’t say whether Texas has eliminated “educator security violations” statewide. The number of Texas educators whose state certification is on the line for test-tampering has stayed about the same from year to year — about 25 annually. “It’s very difficult to quantify how successful we have been,” she says.

Experimental tests that would be ‘harder to game’

In other places, educators are experimenting with different ways to test what kids learn. Bill Tucker, a managing director at Education Sector, a Washington, D.C., think tank, says states like Oregon have led the way with so-called adaptive tests, computerized assessments that actually change as students answer questions right or wrong. Such tests satisfy the requirements of the No Child Left Behind law. Students sit for these tests any time they’re ready, from October on, and the tests allow schools to find out more about how much kids have learned. And since each test is essentially different from the last, they’re “harder to game,” Tucker says.
State coverage

Views

“A large percentage of the kids who leave school are leaving because they’re bored out of their minds. And testing them more isn’t going to prevent that — it’s the old ‘weighing the cows more, and more doesn’t make them heavier.’”

— Tom Watkins, former Michigan state schools superintendent

“I don’t have a problem using some form of state or national assessment for accountability purposes. My problem is that it is inefficient, ineffective and unnecessary for a state to control down to the level of the individual classroom or individual teacher.”

— Michael McGill, Scarsdale, N.Y., superintendent

Texas testing results are like “autopsies” because they come too late to help kids. “What I needed was some leading indicators. We’ve got to find ways to make the tests more useful and to inform instruction.”

— Pat Forgione, former Austin superintendent

In a bid to look beyond bedrock skills such as reading and math, a few states are also looking at other measures, such as how many of their high school graduates had to take remedial classes in college, Tucker says. Federal Race to the Top funding, part of the Obama administration’s education stimulus plan, is pushing states to develop databases that would allow states to track graduates.

The federal government has also invested in two separate efforts by the states to overhaul tests; 45 states are participating. One project is aimed at developing so-called “through testing,” which would sample every few months how much students learn, then combine those scores with the score on an end-of-year test. The other project focuses on computer-adaptive tests, like those used in Oregon, to be given at year’s end.

Changes can’t come soon enough for many educators, who see the entire NCLB testing enterprise as a bit of a folly — it does a poor job measuring what kids are learning, they say, and because of the high stakes inherent for teachers, NCLB encourages intensive test preparation, narrowed curriculums and cheating.

Schools’ efforts to move all kids to “proficient” status in reading and math have a congressionally mandated deadline — they must get the job done by 2014. To meet that goal, more than half the states have already lowered standards to redefine “proficiency” and boost proficiency rates, according to a 2007 federal study.

Even so, Education Secretary Duncan told Congress that unless No Child Left Behind is changed, 82% of public schools could soon be labeled “failing” under the law because their students haven’t shown the “adequate yearly progress” required by the law. He has proposed replacing the 2014 goal with standards that would make all students “college- and career-ready” by 2020.

Everyone scoring 100% would signal ‘a very bad test’

Scarsdale, N.Y., Superintendent Michael McGill says cheating is just the beginning of trouble for a system designed around rigid annual measures. “You can tighten and tighten and tighten the system and people will still find ways to game it,” he says. “People are always going to be smarter than the system — they’re always going to find ways to play it.” McGill has asked state officials to allow him to essentially write his own internationally benchmarked standards and figure out how to get his 4,800 students to meet them. In exchange, he wants freedom from New York state rules that dictate how to rate schools and teachers. So far, there has been no response from the state.

“I don’t have a problem using some form of state or national assessment for accountability purposes,” McGill says. “My problem is that it is inefficient, ineffective and unnecessary for a state to control down to the level of the individual classroom or individual teacher.”

Ann Cook agrees. She’s co-director of Urban Academy Laboratory High School, a small New York City high school that is one of 30 statewide using alternative ways to test students. Cook complains about the “mythology” of a connection between high academic standards and high-stakes tests like those used in all 50 states.

Of course most teachers have high standards, she says, but when they talk about high standards, they’re usually talking about helping kids understand complex ideas, weigh evidence, evaluate sources and — in subjects like writing, history and science — develop a point of view. Multiple-choice tests “tend to wash out a lot of that,” she says.

Former Austin Superintendent Pat Forgione calls the testing reports he used to get back each spring “autopsies” — the test results came too late to do anything for kids. “What I needed was some leading indicators,” he says. “We’ve got to find ways to make the tests more useful and to inform instruction” throughout the year.

John Tanner, the head of Test Sense, a consulting firm in San Antonio, says the problems of standardized testing go deeper: The achievement tests most schools give each spring were never designed to hold teachers’ feet to the fire — they were meant simply to provide a quick snapshot of how schools and classrooms were performing. Pre-NCLB, the tests were often given to a representative class — never to the whole school, and certainly not to each and every student. By design, they produced a wide distribution of scores, because they included questions with vastly different difficulty levels.

NCLB turned that model on its head, demanding that each test work like the chapter quiz in a spelling book — if teachers were skilled enough, it seemed to say, all students could earn 100%. Tanner says that’s preposterous. “The moment when 100% of kids end up at the top of the distribution is the moment you have a very bad test,” he says.

Does everyone need to be tested?

Cook, of Urban Academy, believes that schools should get rid of the high stakes tied to the tests and return to sampling. That’s what the federal government uses every two years or so with the well-regarded National Assessment of Educational Progress, or NAEP.

“When Gallup does polls, they don’t ask everybody in the country what they think,” she says. NAEP, also known as “the nation’s report card,” randomly samples thousands of students in reading, math and other topics in every state. Schools can’t prepare for NAEP because they never know whether they’ll be chosen for a given test.

Tom Watkins, a former Michigan state superintendent, says all the focus on testing hasn’t helped teachers teach and children learn. On the contrary, he says, it has turned millions of kids off to learning. “A large percentage of the kids who leave school are leaving because they’re bored out of their minds,” he says. “And testing them more isn’t going to prevent that — it’s the old ‘weighing the cows more, and more doesn’t make them heavier.’ ”

Many educators are pinning their hopes on the two projects, buoyed by $350 million in federal grants, that are working to develop next-generation tests. “This is really a chance to do some R&D,” says Forgione, who now runs the Educational Testing Service’s Center for K-12 Assessment and Performance Management. “It’s our only chance to … get something more useful and meaningful.” Although he doesn’t hold out hope that schools will soon be able to return to sampling student performance — Congress would have to drastically amend NCLB — he says lawmakers could strike a deal that would limit universal testing to just a few grades and allow others to use sampling.

Watkins, who now runs his own consulting business, travels to China four or five times a year. He says Chinese educators, historically trained to deliver a top-down education that relied heavily on standardized testing and rote memorization, now focus almost obsessively on two things: creativity and innovation.

In China, “the biggest question is, ‘How do we create Bill Gates?’ ” he says. “Everywhere I go, from meeting with a minister of education to being out in the countryside, that’s what they’re striving for.” Oddly enough, he says, China’s transformation has taken place over the past nine years — exactly as long as U.S. schools have been grappling with NCLB. “While we’re moving closer to their historical model, they’re looking at ways to pull away,” he says