failing grade

Why one Harvard professor calls American schools’ focus on testing a ‘charade’

PHOTO: Alan Petersime

Harvard professor Daniel Koretz is on a mission: to convince policymakers that standardized tests have been widely misused.

In his new book, “The Testing Charade,” Koretz argues that federal education policy over the last couple of decades — starting with No Child Left Behind, and continuing with the Obama administration’s push to evaluate teachers in part by test scores — has been a barely mitigated disaster.

The focus on testing in particular has hurt schools and students, Koretz argues. Meanwhile, Koretz says the tests are of little help for accurately identifying which schools are struggling because excessive test prep inflates students’ scores.

“Neither good intentions nor the value of well-used tests justifies continuing to ignore the absurdities and failures of the current system and the real harms it is causing,” Koretz writes in the book’s first chapter.

Daniel Koretz, Harvard Graduate School of Education

His skepticism will be welcome to families of students who have opted out of state tests across the country and others who have led a testing backlash in recent years. That sentiment helped shape the new federal education law, ESSA.

Koretz has another set of allies in some conservative charter and voucher advocates, including — to an extent — Secretary of Education Betsy DeVos, who criticized No Child Left Behind in a recent speech. “As states and districts scrambled to avoid the law’s sanctions and maintain their federal funding, some resorted to focusing specifically on math and reading at the expense of other subjects,” she said. “Others simply inflated scores or lowered standards.”

But national civil rights groups and some Democratic politicians have made a different case: That it’s the government’s responsibility to continue to use test scores to hold schools accountable for serving their students, especially students of color, poor students, and students with disabilities. (ESSA continues to require testing in grades three through eight and for states to identify their lowest performing schools, largely by using test scores.)

We talked to Koretz about his book and asked him to explain how he reached his conclusions and what to make of research that paints a more positive picture of tests and No Child Left Behind.

The interview has been edited for clarity and length.

Do you want to walk me through the central thesis of your book?

The reason I wrote the book is really the subtitle: we’re “pretending to make schools better.”

Most of the bad news that’s in this book is old news. We’ve been collecting evidence of various kinds about the impact of the very heavy handed, high-stakes testing that we use in this country for a long time. I lost patience with people pretending that these facts aren’t present. So I decided it would be worth writing a book that summarizes the evidence both good and bad about the effects of test-based accountability. When you do that, you end up with an awful lot on the bad side and not very much on the good side.

Can you talk about some of the bad effects?

There are a few that are particularly important. One is absolutely rampant bad test prep. It’s just everywhere. One of the consequences of that is that test scores are often very badly inflated.

There aren’t all that many studies of this because it’s not really a welcome suggestion. When you go to the superintendent and say, “Gee, I’d like to see whether your scores are inflated,” they rarely say, “Boy, we’ve been waiting for you to show up.” There aren’t that many studies, but they’re very consistent. The inflation that does show up is sometimes absolutely massive. Worse, there is growing evidence that that problem is more severe for disadvantaged kids, creating the illusion of improved equity.

Another is increasingly widespread cheating. We, of course, will never know just how widespread because there aren’t resources to examine the data from 13,000 school districts. Everyone knows about Atlanta, a few people know about El Paso, but that’s just the tip of the iceberg.

There’s obviously also — and perhaps this should be on the same par — enormous amounts of stress for teachers, for kids, and for parents. That’s the bad side.

I want to ask a little more about test score inflation. What is the strongest evidence for inflation? And let me give you two pieces that to me seem like potentially countervailing evidence. One piece is when I’m looking at research on school turnaround — like the most recent School Improvement Grant program and also turnaround efforts in New York City — these schools have been under intensive pressure to raise test scores. And yet their test score gains on high-stakes tests have been pretty modest at best. The other example is the Smarter Balanced exam. The scores on the Smarter Balanced exam don’t seem to be going up. If anything, they’re going down.

The main issue is that score inflation doesn’t occur in the same amount everywhere. You’ve come up with two examples where there is apparently very little. There are other examples that are much worse than the aggregate data suggest.

In the case of Smarter Balanced, I would wait and see. Score inflation can only occur when people become sufficiently aware of predictable patterns in the test. You can’t game a test when you don’t know what irrelevant things are going to recur, and that just may take some time.

I’m wondering your take on why some of the strongest advocates for test-based accountability have been national civil rights groups.

One of the rationales for some of the most draconian test-based accountability programs we’ve had has been to improve equity. If you got back to the enactment of NCLB, you had [then-Massachusetts Sen.] Teddy Kennedy and [then-California Rep.] George Miller actively lobbying their colleagues in support of a Republican bill. George Miller summed that up in one sentence in a meeting I went to. He said, “It will shed some light in the corners.” He said that schools had been getting away with giving lousy services to disadvantaged kids by showing good performance among advantaged kids, and this would make it in theory impossible to do that.

Even going back before NCLB, I think that’s why there was so much support in the disability community for including disabled kids in test-based accountability in the 1990s — so they couldn’t be hidden away in the basement anymore. I think that’s absolutely laudable. It’s the thing I praise the most strongly about NCLB.

It just didn’t work. That’s really clear from the evidence.

I think the intention was laudable and I think the intention was why high-stakes testing has gotten so much support in the minority community, but it just has failed.

You mention in your book probably the most widely cited study on the achievement effects of No Child Left Behind, showing that there were big gains in fourth grade math and some gains in eighth grade math, but there wasn’t anything good or bad in reading.

Pretty much. There was a little bit of improvement in some years in reading but nothing to write home about.

So the math gains — and that was on the low-stakes federal NAEP test — they’re just not worth it in your view?

I think the gains are real. But there are some reasons not be terribly excited about these. One is that they don’t persist. They decline a little bit by eighth grade, they disappear by the time kids are out of high school. We don’t have good data about kids as they graduate from high school, but what we do have doesn’t show any improvement.

The biggest reason I’m not as excited as some people are about those gains is we’ve had evidence going back to the 1980s that one of the responses that teachers have had to test-based accountability is to take time out of untested subjects and to put it into math and reading. We don’t know how much of that gain in math is because people are teaching math better and how much is because kids aren’t learning about civics.

That’s, in my view, not enough to justify all of the stuff on the other side of the ledger.

When I’ve looked at some studies on the impact of NCLB on students’ social-emotional skills, the impact on teachers’ attitudes in the classrooms, and the impact on voluntary teacher turnover, they haven’t found any negative effects. They also haven’t found positive effects in most cases. But that would seem to at least in one sense undermine the argument that NCLB had big harmful effects on these other outcomes.

I haven’t seen those studies, but I don’t think what you describe does undermine it. What I would like to see is an analysis of long-term trends not just on teacher attrition but on teacher selection. A lot of what I have heard has really been, frankly, anecdotal. I was once a public school teacher and teaching now is utterly unlike what it was when I taught. It seems unlikely that that had no effect on who opts in and who opts out to be a teacher.

I don’t have evidence of this but I suspect that to some extent different types of people are selecting into teaching now than were teaching 30 years ago.

Can you talk about what you see as good versus bad test prep?

Something that Audrey Qualls at the University of Iowa said was, “A student has only mastered something if she can do it when confronted with unfamiliar particulars.”

Think about training pilots — you would never train pilots by putting them in a simulator and then always running exactly the same set of conditions because next time you were in the plane and the conditions were different you’d die. What you want to know is that the pilot has enough understanding and a good enough command of the physical motions and whatnot that he or she can respond to whatever happens to you while you’re up there. That’s not all that distant an analogy from testing.

Bad test prep is test prep that is designed to raise scores on the particular test rather than give kids the underlying knowledge and skills that the test is supposed to capture. It’s absolutely endemic. In fact, districts and states peddle this stuff themselves.

I take it it’s very hard to quantify this test prep phenomenon, though?

It is extremely hard, and there’s a big hole in the research in this area.

Let’s turn from a backward-looking to a forward-looking discussion. What is your take on ESSA? Do you think it’s a step in the right direction?

This may be a little bit simplistic, but I think of ESSA as giving states back a portion of the flexibility they had before No Child Left Behind. It doesn’t give them as much flexibility as they had in 2000.  

It has the potential to substantially reduce pressure, but it doesn’t seem to be changing the basic logic of the system, which is that the thing that will drive school improvement is pushing people to improve test scores. So I’m not optimistic.

One of things that I argue very strongly at the end of the book is that we need to look at a far broader range of, not just outcomes, but aspects of schooling to create an accountability system that will generate more of what we want. ESSA takes one tiny step in that direction: it says you have to have one measure beyond testing and graduation rates. But if you read the statute it almost doesn’t matter what that measure is. The one mandate is that it can’t count as much as test scores — that’s written in the statute. The notion that it means the same thing to monitor the quality of practice or to monitor attendance rates is just absurd

As I’m sure you know, research — including from some of your colleagues at Harvard — has shown that so-called “no-excuses” charter schools in places like Boston, Chicago, and New York City, have led to substantial test score gains and in some cases improvements in four-year college enrollment. Are you skeptical that those gains are the result of genuine learning?

It depends on which test you’re talking about. Some of the no-excuses charter schools drill kids on the state test, so I don’t trust the state test scores for some of those schools. I think it’s entirely plausible that some of those schools are going to affect long-term outcomes because they’re in some cases replacing a very disorderly environment with a very orderly one. In fact, I would say too orderly by quite a margin.

But those reforms are much bigger than just test-based accountability or just the control structure we call charters. It’s a whole host of different things that are going on: different disciplinary policies, different kinds of teacher selection, different kinds of behavioral requirements, all sorts of things.

A lot of the discussion around accountability, including in your book, is about the measures we should be using to identify schools. I’m interested in your take on what happens when a school is identified by whatever system — perhaps by the holistic system you described in the book — as low performing.

The first step is to figure out why is it bad. I would use scores as an opening to a better evaluation of schools. If scores on a good test are low, something is wrong, but we don’t know what. Before we intervene we ought to find out what’s wrong.

This is the Dutch model: school inspections are concentrated on schools that shows signs of having problems, because that’s where the payoff is. I would want to know what’s wrong and then you can design an alternative. In some cases, it may be the teaching staff is too weak. It may be in some cases the teaching staff needs supports they don’t have. It may be like in the case of Baltimore, they need to turn the heat on. Who knows? But I don’t think we can design sensible interventions until we know what the problems are.

Special education reorganization

Only 33 black students with disabilities in Denver met expectations on state tests

Just 2 percent of black students with disabilities in Denver scored at grade-level or higher on state literacy and math tests last year. In raw numbers, that’s just 33 of the 1,641 black students with disabilities in the school district, according to Denver Public Schools data.

The percentage is similar for Latino students with disabilities: only 2.6 percent met expectations on the tests. Meanwhile, nearly 17 percent of white students with disabilities did.

Denver school officials recently revealed those shockingly low numbers and stark racial disparities as further justification for a previously proposed reorganization of the department that oversees special education. The reorganization would shrink the pool of central office staff who help school principals serve students with disabilities, and would increase the number of school psychologists and social workers.

The theory is that providing more robust mental health services in schools will allow the central office staff members who remain to shift their focus from managing behavior crises to improving academic instruction. Because of their expertise, those staff members were often tapped to help teachers deal with challenging behavior from all students, not just those with disabilities, said Eldridge Greer, who oversees special education for Denver Public Schools.

District officials also hope that increasing mental health support will reduce racial disparities in how students are disciplined. District data show black students are six times as likely to be suspended as white students, while Latino students are three times as likely.

“The biases that are in place in our society unfairly target African-American and Latino children to be controlled as a response to trauma, or as a response to readiness-to-learn (issues), instead of being provided more educational support,” Greer said.

Parents of students with disabilities have pushed back against the district’s plan to cut staff dedicated to special education. Advocates have, too.

Pam Bisceglia, executive director of Advocacy Denver, a civil rights organization that serves people with disabilities, said that while the district should be embarrassed by how poorly it’s serving students of color, she’s not sure the proposed reorganization will help.

She and others worry the district is siphoning money from special education to pay for services that will benefit all students – and that in the end, those with disabilities will lose out.

“If the district wants to have a full-time social worker and psychologist in every school, I don’t have a problem with that,” Bisceglia said. “What I have a problem with is the plan doesn’t suggest how instruction is going to look different (for students with disabilities) and how the curriculum is going to be different in terms of learning to read and do math.”

Greer said that in large part, the curriculum and strategies the district has in place are the right ones. What’s lacking, he said, is training for special education teachers, especially those who are new to the profession. Having a cadre of central office staff focused solely on academics will help, he said.

The reorganization, as detailed at a recent school board meeting, calls for cutting 45 districtwide experts who help principals serve students with disabilities – and who Greer said spent a lot of time managing behavior crises. In their place, the district would hire 15 academic specialists, eight more behavior specialists (the district already has seven), and four supervisors.

The overhaul would also ensure that all elementary schools have at least one full-time social worker or psychologist. Schools would also get money to put in place new discipline practices. The school board last year revised its discipline policy to limit suspensions and expulsions of students in preschool through third grade.

In addition, elementary schools with special programs for students with emotional needs would get $50,000 to spend on a mental health worker, teacher, or teacher’s aide.

School principals invited to discuss the reorganization with the school board said they welcomed being able to hire more social workers and psychologists. But they said they are unsure about the rest of the plan.

One principal said he relied heavily on the expert assigned to help his school serve students with disabilities. Another expressed concern about losing capable staff.

“How do we retain some of that talent so we don’t end up with a brain drain and lose all these people that have all this knowledge and expertise?” said Gilberto Muñoz, the principal at Swansea Elementary School in north Denver.

When district officials first presented the plan earlier this year, they framed it as a way to improve the academic performance of students with disabilities. Just 8 percent of Denver fourth-graders with disabilities met expectations on the state literacy test last year, compared with 44 percent of fourth-graders without disabilities.

But Greer said that when they dug into the data, they discovered the racial disparities.

“We knew there were disparities, but to see disparities as profound as the ones I shared with the board, it was important to elevate that,” he said.

Parent Sarah Young said it was courageous of the district to share such shocking data. But she said she thinks their plan to fix the disparities is lacking – and she disagrees with calling it a reorganization.

Young, who has a daughter with a learning disability, visual impairment, and epilepsy, said Denver Public Schools should call the plan what it is: cuts to special education.

“We understand you’re trying to handle behavior,” Young said, referring to the district. “But these are all vulnerable student populations, and we can’t pit them against each other. We can’t be robbing one to try to put a Band-Aid on another.”

Interrupted

Dump truck blamed for fiber optic line cut that disrupted TNReady testing

PHOTO: Alan Petersime

A dump truck is behind the fiber optic line cut that led to more disruptions in state testing Thursday, according to the company that provides internet access for many Tennessee school districts.

The severed cable caused slow internet connections for some districts and caused others to not connect at all. A statement from Education Networks of America said Internet connections were re-established within four hours of the “major” break on Thursday morning.

“The resiliency ENA has built into our network backbone and internet access circuits did reduce the impact of the fiber cut significantly,” according to the company’s statement provided by the Tennessee Department of Education.

State officials were quick to point out the issue was not connected to its testing platform, which has been plagued with issues since the state’s three-week testing window opened on April 16.

“This is an issue related to local connectivity, not with the testing platform,” said Sara Gast, a spokeswoman for the state Department of Education. “Testing can continue, but connectivity may be slow in areas that are impacted until this is resolved.”

Many districts chose to suspend testing for the day, while others left the decision up to school principals.

In Memphis, home to the state’s largest district, a spokeswoman for Shelby County Schools said students were “not able to connect” to the state’s online platform Thursday morning and that principals would decide whether to keep trying. At least one Memphis high school was able to complete testing Thursday afternoon.

TNReady’s online test has experienced widespread interruptions on at least four days since testing began. There were log-in issues on the first day, a reported cyber attack on the second, and a problem with online rosters on Wednesday after the state’s testing company, Questar, updated its software the night before.

Concerns about the subsequent validity of the results prompted state lawmakers to pass two pieces of legislation — the latest one on Wednesday — aimed at preventing students, teachers, schools, and districts from being negatively impacted by the data.

The online issues are affecting high school students statewide. Some districts also chose to expand computerized testing this year to middle grades. For the state’s youngest students, TNReady was being given on paper.

This story has been updated.