Get important education news and analysis delivered straight to your inbox
In Washington, D.C., one of the first places in the country to use value-added teacher ratings to fire teachers, teacher-union president Nathan Saunders likes to point to the following statistic as proof that the ratings are flawed: Ward 8, one of the poorest areas of the city, has only 5 percent of the teachers defined as effective under the new evaluation system known as IMPACT, but more than a quarter of the ineffective ones. Ward 3, encompassing some of the city’s more affluent neighborhoods, has nearly a quarter of the best teachers, but only 8 percent of the worst.
The discrepancy highlights an ongoing debate about the value-added test scores that an increasing number of states—soon to include Florida—are using to evaluate teachers. Are the best, most experienced D.C. teachers concentrated in the wealthiest schools, while the worst are concentrated in the poorest schools? Or does the statistical model ignore the possibility that it’s more difficult to teach a room full of impoverished children?
Saunders thinks it’s harder for teachers in high-poverty schools. “The fact that kids show up to school hungry and distracted and they have no eyeglasses and can’t see the board, it doesn’t even acknowledge that,” he said.
But many researchers argue that value-added models don’t need to control for demographic factors like poverty, race, English-learner or special-education status at the individual student level, as long as enough test score data (at least three years) are included in the formula. They say states and districts choose to include demographic characteristics in the models to satisfy unions and other constituents—not because it’s statistically necessary.
William Sanders, a former University of Tennessee researcher now at the SAS Institute Inc., has spent nearly three decades working on a complex statistical formula that’s been adopted in districts serving a total of 12 million students around the country. With at least three years of test-score data from different academic subjects, he says he is able to home in on a good prediction of what a particular student’s progress should look like in a given year—and thus, how much a teacher should be expected to teach the student. Adding demographic factors only muddies the picture, he argues.
“If you’ve got a poor black kid and a rich white kid that have exactly the same academic achievement levels, do you want the same expectations for both of them the next year? If the answer is yes, then you don’t want to be sticking things in the model that will be giving the black kid a boost,” he said.
But Eric Isenberg, a Mathematica researcher and one of the designers of the IMPACT value-added model for Washington, D.C., says he’s “never been really compelled by the lower-the-expectations-for-students argument.” The D.C. model only uses one year of data, and incorporates the poverty status of individual students, among other factors, to protect against biasing the ratings.
“Nobody ever makes the argument that you’re holding the kids that started at a lower [achievement level] to lower standards,” he said.
There is also debate among researchers about whether the concentration of disadvantaged students in a classroom should be taken into account. Only a handful of value-added models do so.
A large body of research has found that student achievement is affected not only by a student’s individual circumstances at home, but also by the circumstances of other children in the same school and classroom. Studies have found that students surrounded by more advantaged peers tend to score higher on tests than similarly performing students surrounded by less advantaged peers.
To some experts, this research suggests that a teacher with a large number of low-achieving minority children in a classroom, for example, might have a more difficult job than another teacher with few such students.
D.C.’s model doesn’t account for classroom characteristics, but Florida’s model accounts for the percentage of students scoring at similar levels in a class, a variable that may partly address the issue.
Controlling for the demographics of a whole class can be messy, says Douglas Harris, a University of Wisconsin-Madison professor who has studied both value-added modeling and how a student’s peers affect his or her own achievement.
“It’s very hard in a statistical sense to separate for those things,” Harris said. “Accounting for the student level and the classroom and school level is not going to make that much difference.”
Isenberg agrees: “I haven’t seen anything to date that suggests peer effects make a large difference” in the context of value-added teacher evaluations. Nevertheless, he is currently leading research in D.C. and 30 other cities to see if factoring in the concentration of disadvantaged students in a class will make a difference in teachers’ scores.
Daniel McCaffrey, a senior statistician at the RAND Corporation, a nonprofit research group, argues that peer effects can make a difference, however. If there are enough years of test-score data, “including individual-level race and income … in the model doesn’t matter very much,” he said. On the other hand, including classroom-level data “tends to matter more and can make meaningful changes” to a teacher’s rating.
Sanders says that in his years of research, he has found no correlation between the concentration of disadvantaged students and school performance on value-added measures. “It becomes a question of where do you want to put your risk,” he said. Should school districts risk hiding the fact that high-poverty schools tend to get more ineffective teachers, he asked, or risk rating teachers with high numbers of disadvantaged students incorrectly?
At The Hechinger Report, we publish thoughtful letters from readers that contribute to the ongoing discussion about the education topics we cover. Please read our guidelines for more information. We will not consider letters that do not contain a full name and valid email address. You may submit news tips or ideas here without a full name, but not letters.
By submitting your name, you grant us permission to publish it with your letter. We will never publish your email address. You must fill out all fields to submit a letter.
I taught for four years in the Baltimore City Public Schools and while I do think the arguments about peer impact are valid, there’s a bigger point being left out of the arguments described above.
Children who live with deep poverty, home or community violence, food insecurity, or unstable parental relationships experience toxic stress in early childhood. When cortisol, the stress hormone, is at high levels in a young child’s brain, the hippocampus doesn’t grow properly, leading to weaker short-term and long-term memory, learning, and thinking skills. Stress has a myriad of other long-term effects, none of them good (see the ACEs Study from Anda et al in 2006 for some horrifying data).
Teachers in high poverty areas not only have to overcome the additional challenges they may face because of students’ behaviors and lack of parent support but face the harder task of teaching children whose brains are not wired for learning. Until all children have identical early experiences (and that’s said with all due irony) it will be impossible to have blanket expectations of all teachers and students.
You accurately say that a large body of research has found student performance is affected by peers effects. Then you could add in the professional judgments of teachers about the effects of concentrated poverty. And then there is common sense.
And yet you cite several researchers who haven’t found evidence to support that. Its only fair to note that their failure to find such evidence cuts both ways. It is just as likely (or much much more likely) that their failure reflects inherent flaws in value-added, or in their lack of understanding of reality in inner city schools, as it does the cumulative body of knowledge of researchers and educators. (for instance, I wonder how well DC people briefed their RAND researcher on logistics of their schools when the model was being constructed.)
Firstly, there are whole ranges of poverty. The key issue is concentrations of generational poverty, that usually co-exist with high levels of trauma and low levels of social trust. And yet, I’ve never heard of a researcher controlling for high concentrations of seriously emotionally disturbed special education students, as opposed to straightforward learning disabilites.
Secondly, high-poverty magnet schools ought to produce high rates of growth, probably even greater than low-poverty schools. When the effects of poverty wash out on a macro level, and vam researchers work a lot with data on that level, I wonder if that influences their perspectives.
And what do economists mean when they find that controlling for poverty only makes a small difference? What do thet define as small? How much of their definitions are mathematical contructs? If a teacher has a 15% chance per year of having their career destroyed because they were misidentified, is that small?
Here’s what I’d call a significant difference. When the difficulty of meeting growth targets produces an exodus of teaching talent from schools with high concentrations of poverty. And then, it will be too late to say “oops!”
I hope everyone reads your piece in the context of Hechinger’s previous piece on the growing concentrations of poverty.
Isn’t there also a body of research that shows that high-poverty schools also have a higher percentage of less-experienced teachers? Wouldn’t that also contribute to the scores we are seeing in Ward 8?
It’s not all perfectly cut and dry.
This was very interesting! Good work.
I find Emmalie’s comments so full of scientific facts—facts that ARE completely ignored by the hordes of non-teachers, all ready and willing to present with their own personal method of quantifying the quality of a teacher, though most or all of them would fail to pass the very systems they have devised to measure the performance of others.
I find it curious that the same people who take a science course and understand the concept of bias, have no problem informing a teacher that they will be observed on such and such a date, with such and such lesson plans, and not realize that NONE of the material gained from that observation has even one scintilla of authenticity: some teachers do well under the “newest member of the Screen Actor’s Guild” methodology, and many of the BEST teachers DO NOT—those who teach well each and every day and period, using best-practices, years of training and experience, and the best people skills out there, but find the “dog and pony show” of the planned observation to be very unrepresentative of what goes on in various classrooms throughout the ebb and flow of energy that takes place in any classroom in a year.
Measuring teacher performance, is, in the humble opinion of an educator with 19 years experience, about as valid an activity as arguing which artist did the best job on a self-portrait…Whistler or Picasso, and with about as much relevance. Obviously success is where preparation meets opportunty—on that we can agree, but how one educator follows best practices and getrs lackluster learning, while another uses very avant-garde methods with fantastic results is the part of the equation that the tea party–those who want so badly to quantify and roboticize classrooms in a video-virtual money profit center—can never escape or prove valid.
We can spend 90 cents of every school dollar actually teaching, with 10 percent on materials, or we can spend 20% on assessment–to determine if teaching really occurred. Seems very silly, but it is all about accountability.
Odd, ISN’T IT, that the virtual schools are having trouble with accountabilty right from the get-go, with teachers signed up to teach classes they never teach, just so the profit center can turn its almighty buck?
MONEY MONEY MONEY!
Submit a letter