Calls for investigation into test credibility go unanswered

State Board of Regents Chancellor Merryl Tisch is calling for state exams to be more “defensible,” but a study investigating test score credibility requested a year ago by the state’s testing oversight board has still not received a go-ahead.

The committee first formally asked the state education department to join an academic study on the state tests in the fall of 2008, said chair Howard Everson. The education department declined but did not rule out future participation. Since then, Everson has received no requests to revisit the idea, he said in an interview yesterday.

“It’s hard to trust the data right now,” said Everson, a psychometrician who is also a senior fellow at the City University of New York. Everson’s committee, the state Technical Advisory Group, is charged with monitoring the state testing process.

The study, which Everson is developing with Harvard education professor Daniel Koretz, would investigate a phenomenon called score inflation. Score inflation happens when rises in test scores reflect something other than actual learning — for instance, bending the test rules (e.g., giving students more time) or even cheating. Tests may also become so predictable that teachers learn to coach students on how to ace them, experts say.

The researchers propose measuring what test scores really represent by creating a “test within a test” that would assess the same skills but in a variety of formats. If students score worse on the new test than on the regular test, Everson said, that would suggest that score inflation is at work.

The key is to give both tests to the same group of students at around the same time. This would eliminate variables that may account for existing differences in results between state exams and alternative assessments such as NAEP.

If such a study happened, Everson said, it would be the first time that New York state accountability tests have ever been rigorously examined for possible score inflation.

Requests for comment by a state education department spokesman on whether a study on score inflation is being considered were not returned.

The research study designed by Everson and Koretz is still in its preliminary stages, Everson said, and the researchers are in early discussions with several states to participate.

New York would be an excellent site for research because of the high stakes attached to the exams, Koretz said in an interview earlier this year.

“In the old days, when the pressure to raise scores wasn’t so high, it mattered less that tests were somewhat predictable,” he said. “But when the pressure is as high as it is now – everywhere, but particularly in New York with value-added measures for teachers and so on – people have every incentive to look for predictable patterns and to narrow their instruction to focus on those patterns.”

Columbia University sociology researcher Jennifer Jennings has found that almost identical questions have appeared on each state math exam since 2006, making it easier for schools to teach to the test.

Koretz said that with increased attention to high-stakes testing in New York, he hopes that the pressure for a thorough look into score inflation will mount.

“I hope that New York can be what finally breaks the ice,” he said.

Everson was quoted in the Times as saying that the state tests are “about as good as we can build them.” In an interview with GothamSchools, Everson elaborated on that appraisal, saying that the technical operations of state testing–how the tests are administered and scored–are strong. But he made a distinction between the way the tests are being given and the way the results are being interpreted.

“It’s just a hypothesis, but we worry that there would be more score inflation,” Everson said. “We think there would be more teaching to the test.” But there is no way to be sure without more rigorous examination, he said.

UPDATE: The New York State Department of Education is currently researching the possibility of using an “audit mechanism,” like the “test within a test” that Everson described, to guard against score inflation in their standardized tests, department spokesman Jonathan Burman said.