The Gates Foundation's latest report from its teacher-effectiveness study concludes that many evaluation models can be useful as long as they include multiple measures.

Now that the city and teachers union are back at the negotiating table to work on teacher evaluations, the Gates Foundation has some tips.

The foundation today released the third and final report about the Measures of Effective Teaching project, an ambitious three-year study that included 3,000 teachers in seven districts, including New York City. The study concludes that teacher effectiveness can indeed be measured and identifies strategies for grading teachers.

Having multiple people observe the same teacher is more effective than having one person observe the teacher multiple times, the study found. Student surveys are stronger predictors of teachers’ ability to raise test scores than observations. And counting state test scores for a third to half of a teacher’s rating is better than weighting the scores less or more.

With the report, the foundation takes a bold stance on a policy issue that remains hotly contested, even as states and school districts across the country have adopted new evaluation systems. But foundation officials are confident because the latest report reflects a change in the study’s design that they say proves that teacher evaluation systems really do measure teachers.

Last year, the foundation released a report concluding that evaluation systems that combine classroom observations and other measures, including student surveys, reliably predicted students’ test scores gains for the teachers in the study.

But the researchers couldn’t conclude that they were measuring the teacher’s effect with their evaluation tools. Those tools could have been picking up other characteristics of students, such as the stability of their home life, instead. The possible interference of unmeasurable influences has been one of the many critiques of “value-added” models, which aim to rate teachers based on a comparison of students’ predicted scores to their actual ones.

So the researchers asked the districts to let them assign the students in the first year of the study randomly to teachers in their school the following year. They concluded that four different evaluation models all showed, to varying degrees, that a teacher’s value-added score with one set of students is likely to be similar with a different set of students, at least within the same school. The study could not test whether teachers were just as likely to be effective in different settings, or with very different student populations.

The researchers also concluded that teachers who elevated their students’ scores on notoriously easy state tests also improved the students’ scores on tougher tests. The tougher tests were aligned to the Common Core standards, on which New York’s state tests will be based starting this spring.

City and union officials are working to hammer out an evaluation deal before next week, a state deadline for districts to adopt a new system or lose funding.

So far, New York has gotten some things right, according to the MET study. The state requires districts to use multiple measures — at least test scores and observations — to rate teachers. Using a rubric to assess observations is also mandatory, as are multiple observations.

But not everything the city and state are doing is what the Gates Foundation would want. State law requires only that 20 to 25 percent of each teacher’s rating to be based on state test scores, less than the study recommends. (Districts can opt to weight test scores more heavily, but few have done so.) The UFT has vowed not to allow student surveys to influence city teachers’ ratings. And the city Department of Education has pushed to make it optional for administrators to speak with teachers about the classes they observed, union officials have said.

Tequilla Banks, the leading educator in Memphis, Tenn., said there’s no point in observing teachers if the teachers and the people who observed them don’t speak about the experience together afterwards.

“That’s the most critical piece, honestly,” Banks said on a conference call today organized by the Gates Foundation. She added, “The observation itself is okay. The feedback and the post-conference, that’s the piece that [teachers] want.”

Still, Chancellor Dennis Walcott said in a statement that the study bore out the city’s approach to new evaluations.

“This report outlines exactly what the city has sought for our teachers and students: a fair evaluation system that looks at many factors, like classroom observations and student achievement. The study shows that evaluation systems can help teachers grow and learn — which in turn helps our students succeed,” he said.

Coming as the city and union scheduled talks for the first time in weeks, Walcott’s statement omitted a second goal the city has often cited for evaluations, beyond helping teachers improve: enabling the city to remove weak teachers.

That was appropriate, because Gates Foundation officials said the MET study has led them to conclude that meaningful teacher evaluations can improve classroom instruction over the long term — something the city’s own research has also borne out.

“Teaching is really complex and great practice takes time [and] tailored feedback,” said Vicki Phillips, the foundation’s education director, on the conference call. “Districts are better served by trying to improve practice rather than trying to make too-fine distinctions among teachers.”

A forthcoming teacher effectiveness project for the foundation will identify strategies for producing value-added scores for the vast majority of teachers whose students do not take state tests, according to Steve Cantrell, the foundation’s chief research officer.

Those findings will not come in time for the city and union to make use of them: They have to agree on how to rate teachers in non-tested grades and subjects before the state will accept the city’s evaluation system.

The Gates Foundation’s complete report is below: