Wide margins of error, instability on city's value-added reports

The value-added reports meant to measure city teachers’ effectiveness have wide margins of error and give judgments that fluctuate — sometimes wildly — from one year to the next, a new analysis finds.

Schools Chancellor Joel Klein has instructed principals to use the Teacher Data Reports as one way to decide which teachers should receive tenure. Teachers who teach English or math to students in grades three through eight receive the reports.

The NYU economist Sean Corcoran found that 31 percent of English teachers who ranked in the bottom quintile of teachers in 2007 had jumped to one of the top two quintile by 2008. About 23 percent of math teachers made the same jump.

There was an overall correlation between how a teacher scored from one year to the next, and for some teachers, the measurement was more stable. Of the math teachers who ranked in the top quintile in 2007, 40 percent retained that crown in 2008.

The Annenberg Institute for School Reform at Brown University, which has a history of criticizing the Bloomberg administration, published Corcoran’s findings, which were part of a wider look at the practice of assigning “value-added” scores to teachers based on their students’ test scores.

The analysis explains the difference between what value-added scores of teachers aim to do and what value-added measurements actually do in practice. The dream is to isolate the effect of a teacher on students’ performance from the effect of everything else; the reality is that the measures approximate that isolated effect with statistics, weak tests, and small sample sizes.

Corcoran offers some praise. “The simple fact that teachers and principals are receiving regular and timely feedback on their students’ achievement is an accomplishment in and of itself, and it is hard to argue that stimulating conversation around improving student achievement is not a positive thing,” he writes. “But,” he writes,

teachers, policymakers, and school leaders should not be seduced by the elegant simplicity of “value-added.”

The weaknesses of value-added detailed in the report include:

the fact that value-added scores are inherently relative, grading teachers on a curve — and thereby rendering the goal of having only high value-added teachers “a technical impossibility,” as Corcoran writes
the interference of imperfect state tests, which, when swapped with other assessments, can make a teacher who had looked stellar suddenly look subpar
and the challenge of truly eliminating the influence of everything else that happens in a school and a classroom from that “unique contribution” by the teacher

Another challenge for the teachers and principals charged with using value-added scores for self-improvement is the uncertainty about what each individual teacher’s score actually is. On each teacher’s report, the city pinpoints the percentile ranking that represents how she compares to other teachers of the same subject and grade.

But while this is the ranking that the teacher most likely holds, it’s far from 100 percent certain. Indeed, the economists who make value-added scores can only be very certain that the teacher falls somewhere on a range of percentiles (and even getting that cautious, they’re still only 95 percent certain). This range, as you might remember from statistics, is called the “confidence interval.”

For most teachers, the confidence interval is at least 30 percentage points long. For math and English teachers with only one year’s worth of data, the average length is over 60 percentage points. That’s a range of, for instance, between the 10th and 70th percentile of teachers.

The average confidence intervals that Corcoran reports are in the chart below. You can see that, because the confidence intervals shrink as the sample size grows, they are longest when only a year’s worth of data is available.

Teachers in the Bronx face the least certainty. Corcoran guesses that this is because their students are the most likely not to be measured, thereby lowering the data pool — either because the students are classified as special ed or English language learners, and don’t take the state test, or because the students move from year to year, making data about their growth over time harder to come by.

The full report is here and below.
The Use of Value-Added Measures of Teacher Effectiveness in Policy and Practice