The system the city uses to award letter grades to schools is complicated and in some ways flawed — but it’s the best system we have.

That’s the conclusion of a report by the Independent Budget Office, the city’s budget watchdog that since 2009 has been charged with scrutinizing Department of Education data. The office examined the city’s progress reports, released annually since 2007, to see whether their underlying metrics produce meaningful results.

The progress reports were meant to radically reorient the way that New Yorkers thought about school performance. Instead of assessing schools simply by the proportion of students passing state tests, the progress reports focus on students’ improvement from year to year. In a precursor to the “value-added” measurements now being used to assess teachers, the reports use a complex and evolving algorithm that controls for student demographics to calculate just how much students have progressed.

The city then assigns each school a letter grade based on its score. The letter grades inform both the city’s decisions about which principals should receive bonuses and which schools should be considered for closure and families’ choices are where to enroll.

The IBO concludes that the progress reports offer a more sophisticated analysis of school performance than ever before — but that there is room for improvement. “The methodology used by the education department is a significant improvement over simply basing measures on comparisons of standardized test scores,” the report concludes. “Still, the School Progress Reports have to be interpreted with caution.”

The IBO looked at three issues: whether the city’s algorithm has successfully controlled for factors outside of schools’ control; whether the reports have reflected long-term shifts as well as short-term changes; and whether minor methodology changes produced outsized score swings.

On the first question, the budget office concluded that overall, progress report scores in a small set of schools, those serving both elementary and middle school students, can be considered “demographically neutral” — or unaffected by student characteristics. But in most cases that was not true, according to the IBO’s analysis.

“All other things equal, elementary, middle, and high schools with a higher percentage of black and Hispanic students were consistently likely to have lower overall scores than other schools,” the report notes. Progress report scores were also lower in high schools with more poor students and more students with disabilities, the IBO concluded.

Confirming previous findings, the IBO also concludes that elementary and middle school scores have been highly volatile, with the majority of those schools receiving three or more different progress report grades since 2007. (High school progress report grades are based on a wider range of variables and have always been more stable.) But the IBO says that changes to the reports’ methodology, particularly around how students’ year-to-year growth is assessed, have made them more stable.

Finally, the IBO’s analysis found that most of the changes made to the progress report methodology in 2010 and 2011 did not affect their overall grade. In general, the office concludes, the city’s reports successfully identified very high- and very low-performing schools under multiple methodologies, but they were less successful at distinguishing among middle performers.

“The distinction between a C and D rating for a school may be the result of the particular methodology that the DOE has chosen, among the many that are possible, rather than the result of school practices or effectiveness,” the report concludes. “Unfortunately, this weakness occurs at precisely the point where high stakes decisions about schools are made.”

Department of Education officials say that while the peer group comparisons do not “completely control” for student characteristics, they do reduce the impact of race and other demographics compared to other measures of school performance. They also point out that the IBO found strong correlations between demographics and school grades in only a handful of the relationships it analyzed. And they note that the IBO’s analysis shows that between half and three quarters of C and D grades issued in 2011 would have been the same using the IBO’s methodology for analyzing score stability.

“As the IBO has recognized, New York City’s progress reports are a huge improvement over other state and district systems for measuring student learning in schools — and for this reason they have become a national model,” said spokesman Matthew Mittenthal in a statement. “Closing the achievement gap in New York City is a core goal of our reform strategy, but as long as it exists we should expect it to show up in school progress reports, which are designed to be an accurate reflection of our schools’ strengths and challenges.”

The complete IBO report is below.