In Washington, D.C., officials shortened a new teacher evaluation checklist after complaints from teachers and principals that it was too long and time-consuming.
In Memphis, Tenn., after a year of piloting new evaluations and a summer of training, some principals and teachers remained confused and overwhelmed.
In Louisiana, one expert warned of lawsuits as the state began to roll out a truncated observation system without first testing it.
But in New Haven, Conn., union officials and reformers alike have praised a collaborative effort to help teachers improve under the city’s new rating system.
As New York City officials and union leaders wrangle over the design of new teacher evaluations due to roll out citywide next year, the experiences of other states and districts offer both inspiration and lessons about what not to do.
“We have learned a lot over the last four years about how to do this effectively and well, and the changes we’ve made are reflective of that,” said Scott Thompson, deputy chief of teacher effectiveness in the D.C. Public Schools, which launched a new evaluation system in 2009.
More frequent and rigorous evaluations are part of a new national push to improve the quality of the teaching force. Two-thirds of states are in the process of adopting new evaluations, and many will include student achievement — usually as measured by standardized tests — along with intensive classroom observations. It’s unclear whether the new evaluations will have the desired effect. Even in places with a few years of experience using new systems, there is not enough data to tell for certain if student achievement is improving as a result of the evaluations.
But early adopters say they have at least begun to pinpoint what hasn’t worked, and what teachers and principals find most useful. Washington, D.C.’s experience may be particularly instructive to districts still in the process of designing systems. The city’s evaluation system has been overhauled twice in response to feedback — and problems.
The number of standards on which teachers are measured during a classroom observation was reduced to 18 because teachers found a checklist of 22 indicators too long and confusing. (New York has piloted a checklist that has 22 indicators but has asked schools to focus on just six at first.) The number of categories for teachers — ranging from “ineffective” to “highly effective” — was increased from four to five in an effort to prevent inflation in the ratings. And teachers who have consistently scored well will no longer be observed as frequently as lower performers to save time and lessen anxiety among teachers.
Tennessee also reduced the observation workload because principals felt overwhelmed. “It may seem pretty obvious, but I think anybody started down this road will tell you this is a huge shift in the role of the principal,” said Sara Heyburn, an assistant commissioner in the Tennessee Department of Education. “We had to move quickly to train more people, and we allowed people to combine observations.”
One of the biggest shifts in D.C. was the decision this year to reduce the reliance on test scores in favor of other measures of student achievement that teachers will determine with their principals. Before, value-added measures, which calculate expected student growth on standardized tests, counted for 50 percent of a D.C. teacher’s rating. But value-added measures have been widely criticized as unreliable. Going forward, they will only count for 35 percent of a teacher’s overall evaluation.
“Student performance will continue to be the largest piece of the pie,” said Kaya Henderson, the D.C. Schools Chancellor, in a statement when the change was announced in August. But, she said, “We are evolving that approach to now include multiple measures.”
Most systems combine two main factors in measuring a teacher’s performance: a rating based on at least one formal classroom observation, and a rating meant to capture how much students learn during the year. Previously, most states called for evaluations that relied on a single observation, and tenured teachers were not observed every year.
In New York, value-added measures — for those teachers whose students take standardized tests — will only make up 25 percent of their rating. Another 15 percent will be based on locally selected measures of student achievement, while the remaining 60 percent will depend on more qualitative measures such as classroom observations.
One of the most vexing problems that many education systems have faced is how to measure student growth, or learning, for the vast majority of teachers who don’t teach in tested subjects or grades.
In Florida, the state is simply developing more standardized tests. Last year in Tennessee, teachers without individual value-added scores were rated on their school’s overall performance on standardized tests. Many teachers said this was unfair, however, according to a report by the state education department. So this summer state officials recommended adding more tests, as long they “benefit student performance.”
Other states have left it to districts or schools to create their own “student learning objectives” or SLOs, such as portfolios of artwork or improvement in skills like playing scales on a trumpet. New York will join them when its system takes effect next year.
But a pilot in Rhode Island demonstrated that it’s difficult to ensure that the learning objectives are rigorous. “The quality of our student learning objectives was not where we ultimately want them to be,” said Rhode Island education commissioner Deborah Gist in an interview with The Hechinger Report last year. “There’s no way to make it be entirely objective ever.”
Although hundreds of teachers have lost their jobs due to low ratings as new evaluations have gone into effect, the evaluations haven’t been the shock to the system that many educators expected. In Florida, for example, the percentage of teachers rated poorly only rose by one percentage point in comparison to the old system, which had been criticized as too lenient. In Tennessee, only 2.5 percent of teachers received one of the lowest two ratings (out of five) based on new classroom observations. Three-quarters of teachers fell into the top two categories. And one of the reasons D.C. changed its rating system this year is because the vast majority of teachers continued to be rated as either “effective” or “highly effective.”
“In the end, the anxiety about these systems is largely about the consequences they might carry,” said Timothy Daly, president of TNTP, a nonprofit advocacy group, which in 2009 published a report on teacher effectiveness that helped spur many of the new reforms. “And the truth is that very few teachers are in the position of facing any consequences, which raises the larger question of, ‘Are these ratings accurate?’”
At the same time, a nearly universal piece of advice from education officials in other districts and states is to work closely with teachers when designing the new evaluations. Dozens of teachers in New Haven, Conn., have left because they were rated poorly under the new evaluation system there. But the union was a partner in developing it, and criticism has been muted compared to elsewhere.
“If you create a system that doesn’t have maximum teacher input, it doesn’t matter how technically sound it is,” said Dan Cruce, a former official in the Delaware Department of Education who now works for the nonprofit policy organization Hope Street Group. “It has to be raised and informed by teacher voices, because that’s who it’s designed for.”
The experiences so far with new evaluations suggest that districts should also expect to make changes as they go along. “The idea is that this is going to continuously improve, just like we expect our educators” to do, said Heyburn, of Tennessee. “You can plan for the hypotheticals, but it’s not till feet hit the ground that you learn the real lessons.”
This story was produced by The Hechinger Report, a nonprofit, nonpartisan education-news outlet based at Teachers College, Columbia University.