Last week, the New York City Department of Education issued its first ever Teacher Preparation Program Reports. The Department was judicious in not describing the reports as an evaluation of the quality or effectiveness of the dozen teacher-preparation programs in the NYC area that collectively produce more than 50 percent of the 10,000 traditional-pathway teachers hired by the city over the past five years.
Others were not so careful. Writing in The New York Times, Javier Hernandez described the PowerPoint slides comparing the 12 programs as “scorecards,” and stated that these ed schools were being “evaluated,” a term repeated in his article’s headline. Politico also used the term “scorecard.” The Wall Street Journal described the data as “rankings,” although teacher-preparation programs were not ranked. The Associated Press described the data as “grading” the colleges and universities, and looked for “winners or losers.” The New York Post and the New York Daily News both referred to “grading” the programs. Even my own institution, Teachers College, which appears in the data, fell into this trap: the headline on the College’s webpage reads, “TC Rated in City Evaluation of Teacher Prep Programs.”
What’s the big deal? Report, description, analysis, comparison, ratings, rankings, evaluation—aren’t these all pretty much the same thing?
No, they are not, for several reasons.
First of all, we cannot view the descriptive information about New York City teachers emerging from each program as an evaluation of the program, because we have no idea if the teachers who start their careers in the Big Apple are typical or representative of all of the new teachers produced by each program. Do NYC schools attract the best or the worst of each program’s graduates? We have no idea.
If you will forgive a sports analogy—drawn from basketball, in honor of our Hoopster-in-Chief, Arne Duncan—consider the players from the University of North Carolina at Chapel Hill who’ve entered the National Basketball Association over the past three decades. Would it be fair to evaluate UNC’s performance as a training-ground for the NBA based only on how its players perform for the Los Angeles Lakers? What about that Michael Jordan fellow, who played only for the Chicago Bulls and Washington Wizards? Should his performance be ignored? When a preparation program sends its graduates to many different destinations, we cannot evaluate its quality based on how those graduates perform in just a single destination.
Now, when Michael Jordan entered the NBA, he was drafted by the Chicago Bulls—after the Houston Rockets picked Hakeem Olajuwon and the Portland Trail Blazers chose Sam Bowie. The nature of the NBA draft is that these teams had exclusive rights to these players, who couldn’t choose to sign with any NBA team, even if they thought that other teams had more talented players, or had a better coach, or would pay them more money. Does playing on a stable team with experienced teammates and an excellent coach improve a player’s performance? It’s hard to know, for most of the time we only see how a player performs with the team that drafted him.
Public education doesn’t use a “draft” to match new teachers with schools, but in both teaching and basketball, there’s a labor market with a supply of, and demand for, new talent. Where teachers wind up and how they perform on the job aren’t entirely up to them; a teacher with specialized training and credentials may only be interviewed and hired by a school seeking a teacher with such specialized expertise. Conversely, one can scarcely fault a teacher for choosing among multiple job offers on the basis of the one that pays the best, or that has the best facilities, or that is in a desirable location. As Kata Mihaly and her colleagues and Bruce Baker of Rutgers have demonstrated, when labor markets result in a non-random distribution of teachers across schools and districts, it’s very difficult to disentangle the effects of the teacher-preparation program on teaching outcomes from the effects of school context.
For this reason, the descriptions of how the graduates of the dozen metro-area teacher-preparation programs are distributed throughout the system are hard to interpret. It’s interesting to see that the graduates of a particular program are more likely to teach in what the Department of Education refers to as highest-need schools, or that the teachers from a particular program are more likely to leave the district than those from other programs, but what do such things mean?
In fact, the comparisons across programs revealed far more similarities than differences, which very likely is reassuring to the Education Department, which inevitably must rely on diverse providers to supply the teachers it hires each year.
The data receiving the most attention were the ratings that graduates of the 12 programs received via the New York Student Growth Percentiles methodology developed by the State Education Department for the Annual Professional Performance Reviews. The 2011-12 methodology unfairly penalized some teachers and rewarded others, in my professional opinion, and the ratings were only assigned to the 15 percent of educators teaching either English Language Arts or mathematics in grades four through eight—scarcely a representative subset of the teachers prepared in any of the dozen programs. (And then there’s the pesky question of whether the state’s tests in 2011 and 2012 were good indicators of the most important things we want students to learn.) But some observers continue to view them as the most “objective” sources of information about teacher performance. Secretary Duncan, for example, said that the project “puts the record of preparation programs—including their impact on student learning—into sharp focus.”
The distribution of performance among teachers in New York City looks a lot like that across the state: seven percent of teachers rated highly effective, and six percent rated ineffective, with the vast majority rated effective, based on the Student Growth Percentiles. And, although the Department of Education didn’t come out and say this, the distributions look very similar across the 12 teacher-preparation programs as well. A simple measure of association known as the chi-square test indicates that we cannot rule out the possibility that the teacher ratings are the same from one program to the next.
If the numbers in these Teacher Preparation Program Reports lead to deeper inquiries into what the data mean, and constructive conversations among the Department of Education and the leadership of the teacher-preparation programs, I’ll be pleased.
But let’s not mistake this for an evaluation. Or sharp focus.