Improvement in progress report grades: real or random?

Last year, the first round of progress reports attracted anger and ridicule. Perhaps because far fewer schools received low grades, the response this year has been more muted, making room for measured, evidence-based discussion of the DOE’s methodology in constructing the reports.

Over at Eduwonkette, Harvard education professor Daniel Koretz offers a lengthy critique of the progress report methodology. He notes that test scores alone are not a legitimate way to evaluate schools; New York State’s tests were not designed to be used in “value-added” analysis like that behind the progress reports; and the progress reports, like all accountability systems, place pressure on school administrators that likely leads to score inflation. In addition, he writes that the DOE’s formula does not take into account “interval scaling,” or the reality that different amounts of “value” are required to move students from one proficiency level to the next at different points on the proficiency spectrum. (In June, I wrote about how interval scaling might contribute to the finding that No Child Left Behind has helped high-performing students less than their low-performing peers.)

But those problems exist in many test-based, value-added accountability systems — Koretz writes that New York’s progress report system has its own set of errors. The tremendous variation in schools’ grades from last year to this year probably has less to do with school improvement than sampling and measurement error, he writes.

Here’s an illustration of the effect of error. I first calculated the variation in schools’ grades between last year and this year and then graphed it against their enrollments. It’s obvious that larger schools were less likely to see sizable changes in their grades than smaller ones. No school with more than 1,500 students went up or down more than one grade, while all schools whose grades changed the maximum amount possible had fewer than 1,000 students; most of those that increased by that amount had 500 students or fewer.

A substantive explanation for this distribution might be that large schools don’t do a good job moving their students forward, and smaller schools can give more attention to each student’s individual needs. But I’m with Koretz that correct explanation is more likely to be methodological — and rooted in error. A school with 400 students sounds like it would produce stable results. But consider that elementary progress reports only look at two grades’ worth of students — those with two years of test scores. The progress report grade for a school with 400 students could depend on just 100 students’ test scores — hardly a sample that allows chance differences among students, and in each student’s year-to-year test experiences, to wash out.