New York teachers hate the idea of outsiders evaluating them

This story was co-produced by The Teacher Project, an education reporting initiative at Columbia Graduate School of Journalism dedicated to covering teachers, and The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Read more about teacher effectiveness.

Website for Capital — This story also appeared in Capital

When New York Governor Andrew Cuomo proposed a new teacher evaluation system in January that would rely heavily on outside consultants’ opinions, rank-and-file teachers and principals across the city exploded in surprised outrage.

But similar consultants have already evaluated teachers in a handful of other places across the country, including Toledo, Ohio; Montgomery County, Maryland; and, perhaps most notably, Washington, D.C.

Experience elsewhere suggests that having an outside educator observe teachers can be successful in the short term. But whether the use of outside evaluators — or any particular approach to evaluating teachers — improves teaching in the long run remains an open question.

Research supported by the Bill & Melinda Gates Foundation, which has advocated for more rigorous teacher evaluations, suggests that two evaluators tend to be more accurate than one. It’s costly to implement, however, and requires careful planning and buy-in from teachers themselves to pull off well.

New York teacher evaluation — Supporters of public education rally on the Great Western Staircase at the Capitol on Thursday, March 26, 2015, in Albany, N.Y. Lawmakers and Gov. Andrew Cuomo were at odds over school spending, with Cuomo saying a big increase in school spending should be tied to revised teacher evaluations. Credit: Mike Groll/AP

The discussion in New York about using outside evaluators to help grade teachers has been fractious. Neither the teachers nor principals unions were consulted before the plan was announced, and Gov. Cuomo initially tied increases in school funding to the plan’s passage. Now, the legislature will punt part of the responsibility for the new evaluation system to the Board of Regents with a deadline of June 30.

Washington, D.C.’s system, with its reliance on outside observers to evaluate teachers, is probably the most similar to Gov. Cuomo’s proposal, but differs in some key ways — most notably in the weight placed on the outsiders’ opinions.

“This type of evaluation system can help drive and sustain improvements in performance, but they have to be well communicated to teachers,” said Thomas Dee, an education researcher at Stanford University and co-author of a study that found the D.C. system to be an effective means of encouraging low-performing teachers to improve their practice.

There are lessons that the D.C. experience could teach New York, in what’s working there and what isn’t.

Michelle Rhee, the former chancellor of the District of Columbia Public Schools, began implementing the city’s new teacher evaluation system, called IMPACT, more than five years ago. The old system consisted of annual teacher observations done by principals. By contrast, IMPACT relies on observational scores both from principals and from“master educators” — highly rated former teachers who work full-time for the district — as well as on student test-score growth, which increasingly is being used to evaluate teachers nationwide.

In 2008, the D.C. school district invited hundreds of teachers to participate in focus groups designed to engage teachers in the new system’s design. Teachers wanted evaluators who had specific content expertise, something principals couldn’t always provide. The master educators’ program was born of this request, according to district officials.

To become master educators, teachers must survive a six-part application process. Summer training includes four rounds of video testing: Master educators complete mock evaluations for already-scored lessons, and are graded for their accuracy. They have to pass three of four tests before they perform real evaluations, and are given three additional follow-up tests throughout the year.

Forty master educators work full time to evaluate nearly 4,000 teachers, managing caseloads of about 100 teachers each semester. The district spends $6.2 million per year to fund the program according to Maggie Thomas, assistant director of the master educator program for the D.C. Public Schools, or DCPS.

Teachers are judged by broad categories that cover things such as how well they explain content to students, how organized and time-efficient their lessons are and how they reach students with different abilities.

The IMPACT system rates teachers on a scale from “ineffective” to “highly effective.” “Ineffective” teachers are dismissed immediately and teachers rated “minimally effective” are given one year to improve before being considered for dismissal. As teachers move up the district’s career ladder, called LIFT, from “teacher” to “expert teacher,” they are observed with less frequency. Annual IMPACT scores determine whether a teacher moves up the ladder or not.

“If an outside evaluator comes in and says, ‘This is the problem,’ and the principal says, ‘No, this is the problem,’ who does the teacher pay attention to?”

But when the IMPACT plan was rolled out, some teachers, and the local teachers union, saw it as overly punitive — more focused on firing teachers than helping them improve. In 2010, IMPACT’s first year, nearly 2 percent of teachers were fired. Then, in 2011, another 5 percent of teachers, 206 men and women, were fired for poor performance. Union officials say hundreds more have since lost their jobs.

The results of IMPACT have been contentious. The study by Stanford’s Dee, with co-author James Wyckoff from the University of Virginia, found that the system has had a positive effect on teachers with both low and high ratings. The potential for significant bonuses and permanent raises has encouraged good teachers to get better. And the real threat of dismissal has pushed struggling teachers to leave voluntarily or to seek to improve, often with help from the master educators assigned to them. Either outcome is good for students, said DCPS’s Thomas.

Another study of the IMPACT system, published by Education Sector, an education think tank run by the American Institutes for Research, reported that the teachers interviewed “almost universally liked the people who evaluated them, finding them for the most part helpful, empathetic and smart.”

But other education experts are more skeptical — Linda Darling Hammond from Stanford University criticized IMPACT’s heavy reliance on test-score growth, which can be an unreliable way to measure teacher effectiveness. And while test scores in the district have improved since IMPACT began, a recent study by the National Urban League found that Washington, D.C. produces the nation’s largest reading-proficiency gaps between black, Hispanic and white fourth-graders.

In fact, most of the complaints about IMPACT have focused on the use of test scores, not the master educators. (In 2013, calculation errors resulted in erroneous evaluation scores for 44 teachers, including one who was mistakenly fired.)

But the outside evaluators have also been a source of frustration for some teachers in D.C.

Laura Fuchs, a social studies teacher, said in a statement to chancellor Kaya Henderson in 2011 that short visits from master educators don’t show the full picture of a teacher’s efforts in the classroom. She complained that post-observation conferences are treated as little more than a chance for evaluators to tell teachers everything they did wrong.

“If IMPACT had its way, each lesson would mirror a specific pattern each day,” she said, “and if you do that, you get yourself bonuses, accolades and the respect of the DCPS Central Office’s administration.”

DCPS’s Thomas countered that, before being observed, teachers are able to submit 500-word narratives to help their evaluators understand the dynamics of their classrooms. Officials have made several other changes to the system, including giving teachers the opportunity to have their lowest observation score dropped (if it’s less than the average of the others). And brand-new teachers aren’t observed until several months into the school year, to give them time to adjust.

But Washington Teachers Union president Elizabeth Davis insists teachers haven’t been given enough information about what evaluators are looking for, and that it was a mistake not to involve the union in designing IMPACT.

Thomas admits that the district didn’t do enough to inform teachers about the role of the outside evaluators, a concern echoed in criticisms of the New York plan.

A major difference between IMPACT and Gov. Cuomo’s proposal is the weight given to the principals’ versus the outsiders’ evaluations. In Washington, D.C., principals’ scores are given more priority, according to Thomas.

For teachers in subjects that are tested, principals’ observations count 24 percent and master educators’ 16 percent of the total IMPACT score (student test-score growth and professionalism scores make up the rest); when testing is not applicable (due to subject matter, for example), principals’ observations count 45 percent and master educators’ 30 percent of the total IMPACT score. (For this year only, DCPS halted the use of test scores in teacher evaluations to account for new Common Core state tests.)

In Gov. Cuomo’s new proposal for evaluation of New York state teachers, principals’ observations would count for just 15 percent of a teacher’s total score, while outside evaluators’ observations would count 35 percent.

Under Cuomo’s original proposal, observations would count for 50 percent of teachers’ evaluations. Of that total share, the principal’s observations would count for just 15 percent, compared to 35 percent for the outside evaluator’s observations. Principals were frustrated that they would no longer have the right to decide which teachers remain and which are fired from their schools — test scores and an outside evaluator’s opinion would matter significantly more.

But in the version of the state budget that passed on March 31, no mention is made of how principals’ and outside evaluators’ observations will be weighted, though both are mandatory. Districts can also opt to include a third observation performed by a peer who is rated “effective” or “highly effective.” The state education department will determine how much each score will be worth.

The United Federation of Teachers, the teachers union in New York City, has come out strongly against the proposed outside evaluator piece.

UFT President Michael Mulgrew complained that there are few details about the plan, including what kind of training outside evaluators will receive. And administrators are concerned that the setup will create an unnecessary power struggle.

“If an outside evaluator comes in and says, ‘This is the problem,’ and the principal says, ‘No, this is the problem,’ who does the teacher pay attention to?” said Mark Cannizzaro, executive vice president of the Council of School Supervisors and Administrators, the city’s principals union.

A Cuomo administration official told reporters early this month that the “independent” evaluators could be principals or “highly effective” teachers from other schools and districts; college professors or retired educators could also qualify to do the evaluations.

Teachers have protested alongside parents across the state, asserting that the funding needed to support evaluators’ travel to observe upwards of 300,000 teachers could be better spent elsewhere.

Education advocates note that the goal of classroom observations has traditionally been to improve teachers’ ability to teach, which requires ongoing classroom visits that are connected to professional development plans. But in New York, it is unclear whether or not additional training will be offered to teachers by outside evaluators.

There is an alternative to outside evaluators that tends to get more teacher support. In what is known as “peer assistance review,” professional development — not punishment — is explicitly made the priority of teacher evaluations. Teachers are enlisted to mentor, coach and in some cases evaluate their peers. The first such program was started in Toledo, Ohio, in the early 1980s, and replicas exist in places like Montgomery County, Maryland, and Rochester, New York. There, novice teachers and veterans identified as struggling are coached and observed by a peer teacher. Often, teachers set their own academic goals with their mentors, and are evaluated on those goals throughout the school year.

In a similar vein, a group of teachers known as Educators 4 Excellence, which is sometimes at odds with the teachers union in New York, proposed an alternate evaluation plan to Cuomo’s that uses peers instead of outside consultants. It would base 35 percent of each evaluation on state test scores, 45 percent on principal observations and 20 percent on peer evaluators.

UFT’s Mulgrew has indicated openness to the idea of statewide peer evaluators, which are already in use for struggling teachers in New York City. “If we’re able to train outside evaluators that can point out strengths and weaknesses and help teachers move forward, I’d be willing to talk about that,” he said.

Dee, the Stanford researcher, offered one piece of advice as the state moves forward: No matter who is completing evaluations, excluding teachers from their design makes any system seem to have an “ominous veil.”

Washington D.C.’s advice to New York? Keep teachers — and unions — involved in outlining how educators will be evaluated, and by whom, to help fight the “perpetual feeling of disenfranchisement” felt by teachers, said DCPS’s Thomas.