Last week, the New York City Department of Education issued its first-ever Teacher Preparation Program Reports. The department was judicious in not describing the reports as an evaluation of the quality or effectiveness of the dozen teacher-preparation programs in the New York City area that collectively produce more than 50 percent of the 10,000 traditional-pathway teachers hired by the city over the past five years.
Others were not so careful. Writing in The New York Times, Javier Hernandez described the PowerPoint slides comparing the 12 programs as “scorecards,” and stated that these ed schools were being “evaluated,” a term repeated in his article’s headline. Politico also used the term “scorecard.” The Wall Street Journal described the data as “rankings,” although teacher-preparation programs were not ranked. The Associated Press described the data as “grading” the colleges and universities, and looked for “winners or losers.” The New York Post and the New York Daily News both referred to “grading” the programs. Even my own institution, Teachers College, which appears in the data, fell into this trap: the headline on the college’s webpage reads, “TC Rated in City Evaluation of Teacher Prep Programs.”
What’s the big deal? Report, description, analysis, comparison, ratings, rankings, evaluation — aren’t these all pretty much the same thing?
No, they are not, for several reasons.
In the past few weeks, two major reports on teacher turnover and retention have been released. One was rolled out with extensive media coverage, and has been the subject of much discussion among policymakers and education commentators. The other was written by me, along with Teachers College doctoral student Clare Buckley.
The first report, “The Irreplaceables: Understanding the Real Retention Crisis in America’s Urban Schools,” was prepared by TNTP, an organization formerly known as The New Teacher Project that prepares and provides support for teachers in urban districts, and that advocates for changes in teacher policy. The second, “Thoughts of Leaving: An Exploration of Why New York City Middle School Teachers Consider Leaving Their Classrooms,” was released by the Research Alliance for New York City Schools, a nonprofit research group based at New York University. (The research alliance published a report by Will Marinell in February 2011 that examined detailed patterns of teacher turnover in New York City middle schools apparent through the district’s human-resources office.)
There are some important similarities between the two new reports. Both surveyed teachers in large urban districts about their plans to stay in their current schools or to depart either for other schools, other districts or other careers. Both also sought to understand the features of teachers’ work on the job that were influential in their plans to stay or leave. The study of New York City relied on a large, anonymous sample of middle-school teachers: roughly 80 percent of the full-time teachers in 125 middle schools across the city. In contrast, the TNTP study surveyed smaller numbers of teachers in four urban districts (one of which appears to be New York City), and the surveys were not anonymous, because TNTP wanted to link teachers’ survey responses to what the authors viewed as measures of teachers’ performance, such as value-added scores or summary teacher evaluations.
The headlines from the two studies aren’t that different: In any given school, many teachers think about leaving, and it’s not easy to predict why some teachers are more poised to move than others.
For 10 months, Carolyn Abbott waited for the other shoe to drop. In April 2011, Abbott, who teaches mathematics to seventh- and eighth-graders at the Anderson School, a citywide gifted-and-talented school on the Upper West Side of Manhattan, received some startling news. Her score on the Teacher Data Report, the New York City Department of Education’s effort to isolate a teacher’s contribution to her students’ performance on New York State’s math and English Language Arts tests in grades four through eight, said that 32 percent of seventh-grade math teachers and 0 percent of eighth-grade math teachers scored below her.
She was, according to this report, the worst eighth-grade math teacher in New York City, where she has taught since 2007.
“I was angry, upset, offended,” she said. Abbott sought out her principal, who reassured her that she was an excellent teacher and that the Teacher Data Reports bore no relation to her performance. But, the principal confided, she was worried; although she would enthusiastically recommend Abbott for tenure, the Teacher Data Report could count against her in the tenure process. With a new district superintendent reviewing the tenure recommendation, anything could happen.
Using a statistical technique called value-added modeling, the Teacher Data Reports compare how students are predicted to perform on the state ELA and math tests, based on their prior year’s performance, with their actual performance. Teachers whose students do better than predicted are said to have “added value”; those whose students do worse than predicted are “subtracting value.” By definition, about half of all teachers will add value, and the other half will not.
Carolyn Abbott was, in one respect, a victim of her own success. After a year in her classroom, her seventh-grade students scored at the 98th percentile of New York City students on the 2009 state test. As eighth-graders, they were predicted to score at the 97th percentile on the 2010 state test. However, their actual performance was at the 89th percentile of students across the city. That shortfall — the difference between the 97th percentile and the 89th percentile — placed Abbott near the very bottom of the 1,300 eighth-grade mathematics teachers in New York City.
What can one say about Mayor Michael Bloomberg’s leadership of the New York City public schools that hasn’t been said before? After nearly a decade of mayoral control, the Bloomberg regime is the status quo.
Through most of that time, Bloomberg has justified mayoral control as a mechanism for focusing accountability for the achievement of New York’s 1.1 million students. Mayoral control, he argued, placed him solely responsible for the system, and he should be judged by the results. If members of the voting public didn’t like what they were seeing, well, they could just vote him out of office at the end of his term.
The centralization of authority in a single individual paralleled a structure with which Bloomberg was highly familiar: CEO of a large, complex business. Bloomberg L.P., the company Mike Bloomberg founded, offers an array of financial and information services to hundreds of thousands of customers around the world. The company’s website describes its hallmark as “innovation and a passion for getting things right.”
That’s why it’s so disconcerting to hear the mayor hold forth on educational outcomes in New York City. Is he speaking as a CEO seeking to bolster his investors’ confidence in his products? Or do his public pronouncements reflect the assessments that he uses to guide the internal strategies of the organization? Does he respond to new information and incorporate it into his thinking? A certain amount of public optimism and embellishment would be tolerable if they were accompanied by a realistic appraisal of the successes and failures of his initiatives. Does the mayor truly understand the state of education in New York City?
Speaking at a panel on big-city school reform in Washington, D.C. on March 2nd, Mayor Bloomberg repeated a claim he’s made before: “We have closed the gap between black and Latino kids and white and Asian kids,” he said. “We have cut it in half.”
It’s a claim that has never held up to serious scrutiny.
Each fall, thousands of runners descend on the Big Apple to run the New York City marathon. They’ve trained hard all year, and give their all on the course. Long after the elite runners have finished, they stream across the finish line in clumps, exhausted at the end of their 26.2-mile journey. In the middle of the pack, as many as eight or 10 runners might cross the finish line in a single second, and nearly 400 in a single minute.
The difference between a time of 4:08:00 and 4:09:00, however, isn’t large enough to be important. It’s the difference between a rate of 9:28 per mile and 9:30 per mile. Given the vagaries of marathon running — the wind, the temperature, the features of the course — it would be unwise to conclude that the runner who crossed the finish line in 4:08:00 is a much better marathoner than the one who finished in 4:09:00.
But the runner with a time of 4:08:00 finished several hundred places ahead of the runner who finished in 4:09:00 — surely that counts for something! Not really, I’d say. We can quantify the difference, both in absolute terms and in relative position, but these differences are not large enough to be meaningful.
The same is true of the information in the Teacher Data Reports recently released in New York City. Small differences in the estimated effects of teachers on their students’ achievement can appear to be much larger, because most teachers are about equally successful with the assortment of students they teach in a given year, regardless of whether those students begin the year as low-achievers or high-achievers. A trivial difference can appear much larger than it actually is, because, like the marathoners, many teachers are “crossing the finish line” at about the same time.
The word rigor comes up a lot in teacher-evaluation systems. It’s akin to motherhood, apple pie and the American flag. What policymaker is going to take a stand against rigor? But the term is getting distorted almost beyond recognition.
In science, a rigorous study is one in which the scientific claims are supported by the evidence. Scientific rigor is primarily determined by the study’s design and data-analysis methods. It has nothing to do with the substance of the scientific claims. A study that concludes that an educational program or intervention is ineffective, for example, is not inherently more rigorous than one that concludes that a program works.
In the current discourse on teacher-evaluation systems, however, an evaluation system is deemed rigorous based either on how much of the evaluation rests on direct measures of student-learning outcomes, or the distribution of teachers into the various rating categories, or both. If an evaluation system relies heavily on No Child Left Behind-style state standardized tests in reading and mathematics — say, 40 percent of the overall evaluation or more — its proponents are likely to describe it as rigorous. Similarly, if an evaluation system has four performance categories — e.g., ineffective, developing, effective and highly effective — a system that classifies very few teachers as highly effective and many teachers as ineffective may be labeled rigorous.
In these instances, the word rigor obscures the subjectivity involved in the final composite rating assigned to teachers. The fraction of the overall evaluation based on student-learning outcomes is wholly a matter of judgment; and if you believe, as I do, that a teacher’s responsibility for advancing student learning extends well beyond the content that appears on standardized tests, you could conceivably argue that increasing the weight given to standardized tests in teacher evaluations makes these evaluations less rigorous. This is, however, a hard sell in the absence of other concrete measures of student-learning outcomes that could supplement the standardized-test results.
Even more importantly, describing a teacher-evaluation system as rigorous hides the fact that the criteria for assigning teachers to performance categories — either for subcomponents or for the overall composite evaluation — are arbitrary. There’s no scientific basis for saying, as New York has, that of the 20 points out of 100 allocated for student “growth” on New York’s state tests, a teacher needs to receive 18 to be rated “highly effective,” or that a teacher receiving 3 to 8 points will be classified as “developing.” In fact, the cut-off separating “developing” from “effective” changed last week as a result of an agreement reached between the New York State Education Department and the state teachers’ union — not because of science, mind you, but because of politics.
I’ve been relatively quiet in the ongoing debate about how best to evaluate teachers in New York City and across New York State. I’m not close to the negotiations and can claim no expertise on the political machinations outside of public view. At its heart, this seems to me a dispute over jurisdiction: Who has the legitimate authority to regulate the work of an occupation that seeks the status of a profession—but one that is in a labor-management relationship?
The laws of New York recognize the labor-management fault line, but they do little to guide a collective-bargaining process toward agreements in the many districts in which teacher-evaluation systems are contested. Each side brings a powerful public value to bear on the disagreement.
For the employers, it’s all about efficiency. It’s in the public interest, they argue, to recruit, retain and reward the best teachers, in order to maximize the collective achievement of students. A teacher-evaluation system that fails to identify those teachers who are effective, and those who are ineffective, can neither weed out consistent low-performers nor target those who might best benefit from intensive help. Rewarding high-performing teachers can, in the short run, help keep them in their classrooms, they claim, and, in the long run, can help expand the pool of talented individuals who enter the occupation.
What counts as a “fact”? New York State Supreme Court Justice Cynthia Kern’s ruling on the release of the New York City Teacher Data Reports reflects a view very much at odds with the social science research community. In ruling that the Department of Education’s intent to release these reports, which purport to label elementary and middle school teachers as more or less effective based on their students’ performance on state tests of English Language Arts and mathematics, was neither arbitrary nor capricious, Kern held that there is no requirement that data be reliable for them to be disclosed. Rather, the standard she invoked was that the data simply need to be “factual,” quoting a Court of Appeals case that “factual data … simply means objective information, in contrast to opinions, ideas or advice.”
But it is entirely a matter of opinion as to whether the particular statistical analyses involved in the production of the Teacher Data Reports warrant the inference that teachers are more or less effective. All statistical models involve assumptions that lie outside of the data themselves. Whether these assumptions are appropriate is a matter of opinion. Among the key assumptions that are necessary to make inferences about teacher effectiveness from student performance on the state tests are the following:
The tests are valid measures of students’ mastery of English Language Arts and mathematics.
A student’s performance on the test, which is taken on a particular date, reflects how that student would perform on the test on other dates.
The student, classroom and school-level variables taken into account in the value-added model underlying the Teacher Data Reports are appropriate for inferring that a particular teacher caused the test-score gains experienced by that teacher’s students.
Test-score gains observed on tests administered in the middle of one year and the middle of the following year can be properly apportioned to the prior-year teacher and the current-year teacher.
The fact that reasonable people might disagree about these assumptions makes clear that they are a matter of opinion.
Mayor Michael Bloomberg's selection of Hearst Magazines chairman Cathie Black as chancellor of the New York City public schools has hastened a crisis over how to assess expertise in a complex educational system. Does Black have the expertise necessary to assume leadership of a school system with a budget of $23 billion, 135,000 employees, and 1.1 million students? The mayor certainly thinks so. He has described the job as being able to "solve complex problems in the face of controversy, motivate staff, communicate with and bring together diverse constituents, manage labor relations, use data in decision making, and sustain a culture of change and excellence." Black's experience in publishing, he has argued, has demonstrated her bold vision, capacity to make tough financial decisions, skills in negotiation and building support among constituents, and knowledge of state and federal laws. In the eyes of the mayor, these skills — none specific to the field of public education — constitute the expertise required to do the job.
The state of New York has a different conception of the expertise needed to be a school district superintendent. State law specifies that to obtain a professional school district leader certification, school district leaders (i.e., superintendents) must have completed a School District Leader program authorized by the state; accumulated a minimum of 60 semester hours in graduate courses approved by the state commissioner of education; and have at least three years of teaching experience. The certification also includes a full-time, 15-week clinical component of school-building leadership experience or its equivalent, and requires passing two written School District Leader assessments.
The content of the School District Leader assessments provides some purchase on the kinds of expertise that the state views as necessary to successful practice. The standards expressed in these assessments include applying knowledge of skills for engaging building leaders, board members, community members, parents/guardians, students and school staff in an ongoing dialogue regarding core values, goals, policies, practices and achievements; demonstrating knowledge of the New York State Code of Ethics for Educators and the role of values and ethics in district leadership; demonstrating knowledge of factors to consider in comprehensive, long-range planning, including the importance of involving all key stakeholders in planning processes; analyzing concepts, principles and best-practice applications of developmental and learning theories, curriculum development, instructional delivery, and classroom organization and practices with regard to the diverse needs of all students (e.g., special-education students, English-language learners, gifted and talented students); analyzing strategies for developing staff capability through the supervision and evaluation of teachers and building leaders, effective staff assignments, and systems of mentoring, support, and development; and demonstrating knowledge of processes of collective bargaining and contract management that support and extend the educational vision, to name just a few.
If the various requirements of the School District Leader certification are indicators of the expertise that New York state requires of school superintendents, and Cathie Black has not met those requirements, how are we to judge if she has the requisite expertise?
Here's what you need to know about "Waiting for 'Superman." It's not a film — it's a propaganda campaign.
That's not necessarily a bad thing.
The term "propaganda" has gotten a bad rap, ever since its association with 20th-century totalitarian governments promoting troubling political objectives. But there is a long and honorable tradition of propaganda in the genre of documentary films. In its original formulation, "propaganda" is simply a deliberate effort to change what people know, understand and value, for a particular purpose. Propaganda can rely on many different media and symbols to carry its message. Documentary films have often sought to activate a sense of urgency about a social problem or condition that needs our attention. The medium of film is especially powerful because propaganda often appeals to emotion as much as reason, and film is very effective at evoking an emotional response. Much better than, say, a speech by Al Gore, Arne Duncan or Bill Gates.
I had the opportunity to view Waiting for "Superman," the new documentary by Academy Award-winning filmmaker Davis Guggenheim, at a pre-release screening at Teachers College last week. Based on the early buzz from proponents and detractors alike, I expected to see a film that lived up to its billing as "stirring" or "moving."
I'll admit it: When I hear the phrase "charter school miracle," my antennae go up. It's not that I think that charter schools can't possibly be good schools, or that they cannot surpass traditional public schools in the measured achievements of their students. The evidence is pretty clear that there are many fine charter schools, just as there are many struggling charter schools.
No, it's that I think miracles are exceedingly rare phenomena. And the current narrative about miracles in school reform relies heavily on a "great man" theory, replete with outsized personalities. Witness the contemporary stage, on the cusp of the release of Waiting for "Superman": Geoffrey Canada, Michelle Rhee, even — God help us — Bill Gates and Joel Klein being anointed as miracle-workers who, by dint of their commitments, hard work and personalities, are overcoming entrenched bureaucracies and transforming the life-chances of poor and minority children across America's urban landscape.
It was against this backdrop that I read Caitlin Flanagan's stirring op-ed that graced the gatefold of Sunday's New York Daily News. Flanagan, a former prep-school teacher who now writes for The Atlantic and other publications, singles out Mike Piscal, who founded a charter management organization called the Inner City Education Foundation (ICEF) that now operates 15 elementary, middle and high schools in south Los Angeles. Flanagan and Piscal were colleagues, once upon a time, in the English department of the elite Harvard-Westlake School.
Flanagan's argument goes something like this: the ICEF schools are extraordinarily high-performing; in fact, the elementary schools have eliminated the achievement gap.
I've become increasingly alarmed at the growing divide between the news and editorial functions of major metropolitan daily newspapers (e.g., in New York City, the New York Times, New York Daily News, and the New York Post; in Washington, DC, the Washington Post). The functions are largely independent, and that is as it should be; the ideological proclivities of the publisher and editorial board should not be shaping what counts as or is reported as news.
To be sure, the editorial page of a newspaper should express a point of view, and a typical reader will likely agree with some viewpoints, and disagree with others. But it's a very dangerous thing when the editorials of a newspaper are not informed by the daily reporting of its journalists. Ignoring the news, reported with a minimum of spin by "beat" reporters, leads to simple-minded and ignorant editorializing on complex matters of public policy. It's also insulting to the profession of journalism, and to the many reporters whose goal is simply to understand the news and get the story right. (I talk to some of the reporters to whom I'm referring.)
A case in point is yesterday's Daily News editorial, "Truth in testing." The editorial is an effort to shore up claims about the success of school reform in New York City under Mayor Mike Bloomberg and Chancellor Joel Klein. Last week's revelations that the state testing system was dramatically overstating student growth and the closing of the achievement gap rocked the New York City Department of Education on its heels. The Daily News editorial board, which has long supported these reforms, came out firing, citing four "facts": (1) The State Education Department defrauded parents and students; (2) Regents Chancellor Merryl Tisch and Education Commissioner David Steiner owned up to the deception; (3) The drop triggered bogus charges that the schools have made no progress; and (4) Only radical action will give New York's kids a shot at the quality education they need.
New York State is releasing the results of the 2010 state assessments in reading and math tomorrow. We're told that the 2010 tests were more difficult than those in previous years, and less predictable, the first steps towards a new assessment system that provides a realistic picture of student proficiency. Testing experts such as Dan Koretz, Jennifer Jennings and Howard Everson presented evidence to the Board of Regents that being judged proficient on the state's tests in grades three through eight or on the Regents exams did not always predict later success in high school or in college. This evidence strongly suggested that the threshold for proficiency was set too low; students who were classified as proficient in eighth-grade math had only a 30% chance of earning a Regents score of 80, which many colleges in the state judge to be the bare minimum for college readiness, had a high chance of scoring below 500 on the SAT, and were likely to be placed in remedial classes if they entered college. And, based on this chart prepared by the NYC Department of Education, of uncertain provenance, a student who is at the minimum threshold for proficiency on the eighth-grade tests has only about a 55% chance of earning a Regents diploma in high school, the state's minimum standard for high school graduation for all students who entered 9th grade in 2008 or later.
Last week, the Board of Regents voted to adjust the cut scores that determine proficiency on the state's readingand math assessments in grades through eight. They didn't say by how much, but we have a clue from Merryl Tisch's assertion that the "inflation rate" on the state tests has been about 20% in recent years. Twenty percent of what is not clear. But I'm going to assume that the cut score for Level 3, which represents proficiency in a subject at a particular grade level, is going to rise substantially at all grades for both reading and math. What are the likely consequences?
We'll see tomorrow, but here's my prediction, focusing on eighth-grade math. First, I'm assuming that the distribution of scale scores for 2010 will be the same as it was for 2009. If the tests were more difficult in 2010, the average scale score might go down a bit; if students were actually learning more in 2010 than in 2009, the average scale score might go up a bit. For my little prediction exercise, I'm assuming that these two things cancel each other out.
Charles Murray is a very confused guy. His op-ed piece in today's New York Times uses the dreary impact of the Milwaukee Parental Choice Program on student achievement to justify policies expanding school choice. Let's get over the fact that school choice plans don't show big impacts on students' performance on standardized tests, he argues. After all, we've known for a long time that it's hard for schools to overcome the family advantages of cognitive ability and motivation. Rather, he proposes, we should support school choice because it can allow a small number of parents to choose a curriculum that's better than that offered to students in traditional public schools.
Setting aside some of the most remarkable inconsistencies—Charles Murray, 2010 edition, doesn't think that test scores are meaningful measures of academic performance? Has he met Charles Murray, 1994 edition, who was quite comfortable in The Bell Curve reducing the whole of human intelligence to a single score on the Armed Forces Qualification Test?—Murray fundamentally misunderstands the historic logic of the charter schooling movement—an exchange of autonomy for accountability. We can argue over the scope of that autonomy and accountability, but even those who have disagreed on this site about whether charter schools are properly labeled as public or private schools generally agree that it's appropriate to hold them accountable for their students' performance on assessments measuring standards that are the de facto public curriculum of the state in which they are located. Certainly, the charter movement gains energy from studies showing that students in charter schools may outperform their counterparts in traditional public schools on state assessments. Charter schools may strive to expose students to a curriculum that's more ambitious, but the standards of the state cannot be ignored.
Writing in the pages of today's New York Post, Marcus Winters, Senior Fellow at the Manhattan Institute, argues that charter schools might improve the chances for Black and Hispanic students to enter New York City's prestigious exam high schools. The key evidence for this is the fact that 2.4% of the Black and Hispanic eighth-grade students who attended charter schools in 2009 were offered admission to the eight exam schools, compared to 1.5% of the Black and Hispanic eighth-graders attending traditional public schools. Comparing these rates, he states that Black and Hispanic eighth-graders in charter schools are 60% more likely to obtain a seat in the exam schools than their counterparts in traditional public schools.
It's true that 2.4% is 60% more than 1.5%. But both percentages round to the same whole number, 2. So it's hard to say that the likelihood of admission is dramatically different for students in charter and traditional public schools. And, although Winters pays lip service to the notion that these data are solely descriptive, there's no mistaking his desire to use these data to argue that the quality of charter schools is in fact responsible for this small increase. "Charter schools could," he writes, "increase minority access to the city's esteemed high schools by offering a higher quality elementary and middle school education than is available in the traditional public schools system." Yep, that's true, they could. They could also be successful in recruiting some talented minority students with families that are highly motivated to help them succeed in school. In the latter case, the primary dynamic is selection into charter schools, not their academic consequences.
By focusing on the relative rates of minority access to New York City's specialized exam high schools for students in charter and traditional public schools, however, Winters has buried the lead. The real story here is the fact that, in a system that is overwhelmingly made up of Black and Latino students, very few are getting into the most prestigious high schools. 71% of the eighth-graders in New York City's traditional public schools are Black or Latino, but only 16% of the students offered seats in the specialized exam schools are Black or Latino. Another way of representing the same information is to look at the probability of admission to the exam schools for members of different racial/ethnic groups. As Winters noted, 1.5% of the Black and Latino eighth-graders in traditional public schools were offered admission to the specialized exam schools. But 19% of the white and Asian eighth-graders attending such schools scored high enough on the entrance exam to be offered admission.
GothamSchools Editor Elizabeth Green's cover story in the March 7th edition of the New York Times Sunday magazine tackled the problem of preparing teachers for K-12 classrooms in the United States. Embellished with the provocative title "Building a Better Teacher," Elizabeth's piece profiled two approaches to teacher preparation: a grassroots approach emerging outside of the academy which focuses on a set of techniques that teachers can use to increase learning time and improve learning environments, and a research-based approach developed in colleges and universities emphasizing the knowledge and skills that enable teachers to teach particular school subjects effectively. Elizabeth's story opened with a description of Doug Lemov, who has developed a taxonomy of 49 instructional techniques that he and others believe are critical to effective teaching, and especially to closing the achievement gap between poor, minority children and their more advantaged peers. If we were to judge the relative merits of the two approaches based on the amount of ink devoted to each in her article, we'd conclude that, in the battle for the minds of education policymakers and practitioners, classroom management (i.e., Lemov's taxonomy) had won, and pedagogical content knowledge (i.e., the work of Deborah Ball on Mathematical Knowledge for Teaching) had lost.
The disproportionate emphasis on Lemov's approach in Elizabeth's article surprised me. To be sure, he's a fine human-interest story, and the schools he works with have shown remarkable performance on state achievement tests. But Elizabeth briefly acknowledged the lack of a research basis for Lemov's approach, writing: "And while Lemov has faith in his taxonomy because he chose his champions based on their students' test scores, this is far from scientific proof. The best evidence Lemov has now is anecdotal..." Why would she and the Times choose to feature an approach with so little evidence to back it up?
Lemov's book, "Teach Like a Champion: 49 Techniques that Put Students on the Path to College," was published two weeks ago, and currently ranks #30 on Amazon's bestseller list. I wanted to see what he had to say about the research evidence underpinning the techniques. A thin research base does not, of course, mean that the techniques are not valuable—I expect to learn quite a bit from studying them, and seeing if there are opportunities to adapt them for teaching my graduate students (who will tell you that classroom management is not my strong suit.) And, of course, who wouldn't want to be a champion teacher? Because it is, after all, a competition, right?
Last night, at the GothamSchools party, I had the opportunity to say hello to David Cantor, Press Secretary for the New York City Department of Education. As he turned to talk with an angry parent, a piece of paper fell out of his pocket, and I picked it up. It looked like a draft of the press release he issued for the release of the 2009 NYC NAEP math scores, but it was all marked up. Could I have found his annotations as he was drafting the press release?
Chancellor Klein Applauds New York City Public School Students For Six Years of Sustained and Significant Gains in Math on National Exam (Let's get that "six years" in at the start, to make it look like the growth has been steady, rather than stalled over the past two years.)
City Students Outperform the Rest of the State and Nation on the National Assessment of Educational Progress ("Outperform"? Only in the sense that NYC fourth-graders scored almost as high as students in the nation overall, and were significantly lower than eighth-graders nationally. But it's a headline, and who pays attention to them, anyway?)
Record Number of Students Performing at or Above Proficiency
Chancellor Calls on State to Adopt More Rigorous Standards to Ensure Further Progress
Schools Chancellor Joel I. Klein today applauded consistent and sustained gains by New York City public school students on the 2009 National Assessment of Educational Progress (NAEP) math exam. (Consistent and sustained might be a stretch, but maybe it'll pass.)
What is it about the Harlem Children's Zone that causes pundits and reporters to suspend disbelief? Perhaps it's the deep desire for evidence that the large and persistent racial gap in educational achievement can be overcome. The enduring racial inequalities in educational and social outcomes in the U.S. are a blight on our society, and evidence that these inequalities can be eliminated, however, tenuous, can be elevated into the feel-good story of the year.
Last night, Anderson Cooper reported on the Harlem Children's Zone for the CBS newsmagazine 60 Minutes. "For years, educators have tried and failed to get poor kids from the inner city to do just as well in school as kids from America's more affluent suburbs," he began. "Black kids still routinely score well below white kids on national standardized tests. But a man named Geoffrey Canada may have figured out a way to close that racial achievement gap." Cooper asked Canada, "So you're trying to level the playing field between kids here in Harlem and middle class kids in a suburb?" "That's exactly what we have to do," Canada replied.
As is customary, Cooper spoke with Harvard economist Roland Fryer, who has analyzed the achievement of students attending the HCZ Promise Academy charter schools. Fryer said, "At the elementary school level, he closed the achievement gap in both subjects, math and reading."
"Actually eliminating the gap in elementary school?" Cooper asked.
"We've never seen anything like that. Absolutely eliminating the gap. The gap is gone, and that is absolutely incredible," Fryer said.
Monday afternoon, I had the opportunity to respond to Merryl Tisch, Chancellor of the Board of Regents, and David Steiner, the New York State Commissioner of Education, as they talked about the future of P-16 education in New York State at the Phyllis L. Kossoff Policy Lecture at Teachers College, Columbia University. I wasn't sure what they'd say, so prepared some remarks responding to the proposals regarding teacher education in New York State that the Commissioner presented to the Board of Regents a few weeks ago. For the handful of readers who might be interested, here's what I wrote. (Due to time constraints, I didn't say all of this at the event.) Chancellor Tisch and Commissioner Steiner were quite willing to hear and engage with the critiques that my colleague Lin Goodwin and I offered, and I look forward to continuing this conversation with them.
It's no surprise that the State Education Department and the Board of Regents have taken up the cause of ensuring an equitable distribution of highly-qualified teachers across New York State. The key justification for such a goal is the fact that the K-12 education system is shortchanging our children. Although some students are highly successful, many more are not, and the problems are concentrated in urban school systems serving large numbers of poor children of color.
If that's the problem, is improving the education of teachers the solution? It's certainly part of the solution, given what we know about the centrality of teaching to student learning. But it's by no means the entire solution, as a great many other forces shape student outcomes. For example, a great teacher can't compensate for a child coming to school hungry, and great teaching of an out-of-date curriculum only results in great mastery of out-of-date knowledge. I trust that Chancellor Tisch and Commissioner Steiner are not seduced by claims that the single most important determinant of a child's achievement is the quality of his or her teachers, because that's simply not true. Family background continues to be the dominant factor. But the quality of teachers is, at least in theory, something that is manipulable via education policy initiatives, and it's a lot more tractable than addressing the fact that one in five children under the age of 18 in New York State live below the poverty line.
A few years ago, the New York State lottery's slogan was "Hey, you never know." In its original formulation, the slogan sought to motivate New Yorkers to play the lottery, a game of chance, on the grounds that you never know unless you play if you are a winner. But the slogan is a double entendre when applied to Caroline Hoxby's highly-publicized study of the effects of attending a charter school in New York City. Propelled by Hoxby's forceful claims about the superiority of lottery-based research on charter schools, much of the mainstream media has concluded that we now know definitively that New York City charter schools outperform their traditional counterparts—in spite of the fact that her study has not undergone a rigorous peer review process that might identify problems in the study and ways of addressing them. Today, however, an equally forceful critique prepared by Sean Reardon of Stanford University argues that Hoxby's research is anything but definitive. Citing flaws in the statistical analysis of the report, Reardon writes that it "likely overstates the effects of New York City charter schools on students' cumulative achievement ... It may be that New York City's charter schools do indeed have positive effects on student achievement, but those effects are likely smaller than the report claims."
Reardon is careful to point out that it's not possible, based on the information provided in Hoxby's report and associated documents, to judge the extent of the bias in Hoxby's estimates of charter school effects on student achievement. More than anything, he calls for reserving judgment until more information about the study, its data and methods are available, and until the study has undergone rigorous peer review. Until then, he maintains, it would be unwise to rely on the statistics reported in the study, and the inferences Hoxby and her colleagues draw about charter school effects in New York City.
Here I'll mention two of the features of Reardon's critique that I find particularly persuasive. The first is that Hoxby used an inappropriate set of statistical models to analyze the data, which likely distorts the charter school effects. You might be surprised to learn that Hoxby used statistical models at all. If her results are based on comparing students who won a charter school lottery with students who lost the lottery, and the lottery was fair, balanced and random, why would a model be needed? It seems like the charter school effect would simply be the difference in the outcomes observed for the lottery winners and the lottery losers. But comparing lottery winners and losers isn't really estimating an individual causal effect, because an individual student can't simultaneously be enrolled in a charter school and a traditional public school. Even in the context of a lottery, or any other kind of study that can capitalize on a randomization process, such as a clinical drug trial, statistical models come into play to allow for inferences about cause-and-effect relationships. These inferences are always made in relation to a particular statistical model, and all such models have assumptions.
ATTN: Community Members, Principals, School Leaders, PA/PTA Leaders, CEC Members, Presidents Council, Title 1, parents, elected officials, business owners, non–profit professionals, executives and faith-based…
Is there anything that gets people's dander up faster these days than comparisons of charter schools and traditional public schools? On Thursday, reporter Meredith Kolodner filed a story in the Daily News on the relative performance of charter schools and what the NYC Department of Education calls "district" schools. A fall, 2009 presentation emanating from the Department's Office of Charter Schools, and posted on its website, reported on the charter school landscape in New York City, including the growth and location of charter schools, the composition of students attending them, the DOE's accountability framework for evaluating charter schools, and some evidence on how charter schools were faring on the School Progress Reports, the crown jewel in the DOE's accountability system. (Regular readers may know that I've been critical of key features of the Progress Reports for elementary and middle schools.)
Kolodner drew attention to the fact that although elementary and middle school charter students had higher rates of proficiency on the state math and English Language Arts assessments this year, charter schools on average had a lower score on the progress component of the School Progress Reports. And since the progress component makes up 60% of the overall score, charter schools also had lower overall scores on the Progress Reports than did district schools. She quoted Patrick Sullivan, an appointed member of the Panel for Educational Policy that the DOE describes as its governance body, on the meaning of this pattern. "Either the progress reports are invalid," Sullivan said, "or charter schools are lagging."
The Daily News article and a subsequent posting by Sullivan on the NYC Public School Parents blog prompted a quick reply from Peter Murphy, Director of Policy & Communications for the New York Charter Schools Association (NYCSA), here and here. Murphy called into question the metric used by the DOE in its Progress Reports, especially the fact that student performance only counts for 25% of the overall score, whereas student progress counts for 60%. This, he contended, is "woefully lopsided," and unfairly penalizes schools that have had students scoring high for several years running. If I read his second posting correctly, he concludes that the progress reports indeed are invalid.
I missed Secretary of Education Arne Duncan's speech at Teachers College on Thursday because I was working on his behalf in Washington. I was one of about 17 researchers on a panel evaluating a batch of research proposals on school reform for the Institute of Education Sciences (IES), the research arm of the federal Department of Education. IES seeks to identify malleable factors (e.g., education programs, policies and practices) that can improve education outcomes. To do so, IES has developed a progressive goal structure for research projects. Goal One projects are exploratory, and intended to inform the development of interventions by examining existing relationships between policies and practices and educational outcomes. Goal Two projects are intended to develop innovative educational interventions that can be implemented in school settings, and to collect some preliminary data on the educational outcomes observed in a pilot implementation of the intervention. Goal Three projects use rigorous methods to examine the efficacy of fully-developed interventions, as well as the feasibility of implementation, in at least one local site. And finally, Goal Four projects attempt to evaluate whether interventions proven to be successful in a local site, with help from the program developers, can be scaled up to be effective under different conditions, and without the direct involvement of the program developers. (There's also a Goal Five, for research on measurement, but that's a different animal.) Over the years that IES has had this a goal structure, more than 70% of the projects funded under Goals One through Four have been Goal One or Goal Two projects; about one-quarter have been Goal Three projects, and only 3% have been Goal Four projects.
The reasons for this are pretty clear. To be a good prospect for scaling up in a Goal Four project, an intervention must previously have been shown to be effective in at least one site, using rigorous methods for assessing cause-and-effect relationships. Relatively few interventions meet this threshold, because most policies and programs don't have educationally meaningful effects, even if it seems like they ought to. Similarly, projects that are good candidates for Goal Three funding must previously have shown at least some evidence of effects on student outcomes in pilot studies in which the intervention received a tentative tryout, but not a full-blown test using rigorous experimental or quasi-experimental research methods.
I was struck by a thought experiment: what if my panel of distinguished researchers (the other members, at least) had been presented with a proposal based on the Race to the Top criteria that Secretary Duncan talked about at Teachers College, and which have been acclaimed by opinion writers such as Nick Kristof and David Brooks, as well as the editorial page writers for major newspapers in New York City and around the country? The draft Race to the Top criteria for funding state proposals provide incentives for linking teachers to their students' standardized test scores, and in his remarks on Thursday, Secretary Duncan drew attention to Race to the Top incentives for states and districts to link student performance to the teacher preparation programs from which students' teachers had emerged. Only Louisiana currently does this, the Secretary said. What if a scale-up proposal for this intervention had been presented to a panel charged with applying the IES criteria to evaluate its fundability?
"I think there's nothing wrong with anything." So spoke Chancellor Joel Klein at yesterday's release of the 2009 elementary and middle school progress reports. As Anna Phillips reported, 84% of the schools received a letter grade of A, and an additional 13% received a B. Only two schools out of 1,058 received an F, and just five more were awarded a D.
The letter grades were driven by the remarkable/suspicious gains in 2009 on the state's ELA and math tests. Schools weren't actually compared to one another on their performance this year to derive the letter grades. Rather, they were compared to last year's peer and citywide benchmarks. To use a football metaphor, because test scores rose across the board, virtually all schools moved up the field, but the goalposts didn't move. I wasn't sure that the progress report letter grades could actually be less useful this year than last, but Chancellor Klein's administration has achieved that dubious feat. When 84% of the schools receive an A—the top grade, which everyone understands to signify excellence—what useful information about the school's relative performance is being conveyed to parents, students, educators, and others with a stake in our schools? Not much, in my view.
Last year, my blogging partner Jennifer Jennings (who, for those keeping score at home, is now Dr. J) and I were sharply critical of the 2008 school progress reports. Writing on Jennifer's eduwonkette site, we demonstrated that student achievement growth over the past year—which makes up 60% of the overall progress report letter grade—was highly unreliable. Schools that demonstrated high gains in student achievement from 2006 to 2007 were no more likely to show gains from 2007 to 2008 than schools that showed low gains in 2006 to 2007. We concluded that the measure of student progress making up 60% of the overall progress report grade was picking up chance fluctuations from year to year. And if 60% of the score is random, there's not much genuine information about school performance in the progress report grade.
It wasn't a fluke.
There's a well-known education research textbook by three distinguished scholars at Harvard entitled By Design. Judy Singer, one of the authors, once told me that the working title for the book, rejected by Harvard University Press, was Bungled by Design. That title conveyed the key message of the book, which is that, when it comes to education research, you can't fix by analysis what you bungled by design. The design of a research study dictates what a researcher can plausibly ask, and the credibility of the claims about what is being studied.
The recently-released NYU study of the New York City Principal Leadership Academy comparing graduates of the Aspiring Principals Program to other new NYC principals is, in my view, bungled by design. This is not a knock on the authors, each of whom I know and respect a great deal. Rather, it reflects the fact that the NYU researchers were brought in to study the Aspiring Principals Program of the Leadership Academy long after critical design decisions about how to evaluate the impact of the program were made—either by omission or commission.
The three key limitations I raise here pertain to selection mechanisms that ideally would have been observed by the researchers. The inability to understand and model these selection processes undermines the objective of isolating the effect of the Aspiring Principals Program on student outcomes. (See the comments of Sean Corcoran, lead author of the report, on selection issues here.)
Yesterday, the College Board released its annual report on the SAT, and New York City was quick to follow suit with data on the performance of NYC high school students on the SAT. Citywide average scores fell a few points, at the same time that the numbers of Black and Hispanic students taking the SAT increased. Writing in the Daily News, Rachel Monahan summarized the DOE spin, courtesy of DOE spokesman Andy Jacob: (a) More Black and Hispanic students took the SAT, and fewer white students did; (b) the increasing numbers of SAT-takers are less likely to be high performers than SAT-takers in the past; (c) therefore, let's focus on the increased representativeness of the test-taking group, and ignore the fact that scores fell among Blacks and Hispanics, and that the achievement gap is still huge.
I don't think that we should pay too much attention to single-year changes in test scores of any kind, and especially the SAT, which commenter CarolineSF points out are taken by a self-selected group of high school students. But this year's snapshot nevertheless reveals some hard truths about the performance of New York City's high school students.
Let's address the representativeness issue first. Is there evidence that the rising numbers of Black and Hispanic students taking the SAT reflects a dramatic change in the kinds of students who are taking the SAT? Can we explain the falling average Black and Hispanic SAT scores as reflecting a new group of low-performing NYC high school students striving to get into college?
Two weeks ago, Forbes magazine released their second annual ratings of U.S. colleges and universities. The Forbes ratings are competing with the market leader, U.S. News & World Report, whose rankings are taken way too seriously by the American public and the institutions that are ranked. Moreover, as I've argued recently, these ranking and rating schemes are wholly inadequate for their purported purpose: helping students and their families discern whether a particular institution is likely to be a good fit between a student's needs and interests and a school's capacity to meet those needs and interests. In fact, the situation is much worse for choosing colleges and universities than for choosing elementary or secondary schools. There is even more variability in the experiences of students within a given college or university than within a typical elementary or secondary school, due to the fact that college students have more specialized programs of study.
Forbes has gone to great lengths to distinguish its rating scheme from the one used by U.S. News. The Forbes rankings are based on listings of alumni in Who's Who in America; salaries of alumni; student evaluations from RateMyProfessors.com; four-year graduation rates; numbers of students receiving nationally competitive awards; and the number of faculty receiving awards for scholarship and creative pursuits. This differs dramatically from the U.S. News criteria, which emphasize peer assessments, retention rates, faculty and financial resources, selectivity, graduation rate performance, and alumni giving rates. There's nothing scientific about the choice of indicators making up the respective rankings; it's a matter of judgment, and any reader is free to proclaim that these aren't the indicators that she or he would choose, or that some indicators should get more or less weight than others.
Perhaps the most striking feature of the Forbes rankings is the reliance on RateMyProfessors.com ratings for 25% of the total score. Founded in 1999, RateMyProfessors.com (RMP) is a division of MTV Viacom. I can see a case for incorporating students' reports of their satisfaction with their courses, as long as one doesn't mistake such reports for direct evidence of what students learned in those courses. But using RMP is highly problematic for this purpose, because students choose to rate professors on the website, and the students in a particular college who choose to do so may not be representative of all of the students who attend that college. If the students who post ratings are not representative of the population of students in a given college, the average of those ratings doesn't tell us much that is useful about the typical experience of students.
Mike Bloomberg's comments at Monday's press conference announcing plans to extend a test-based promotion policy to grades four and six were eerily reminiscent of Arne Duncan's and Joel Klein's reactions to two reports on social promotion released by the Consortium on Chicago School Research in 2004. The Chicago Consortium, an independent research group studying Chicago schools, examined the effects of promotional gates at the third-, sixth- and eighth-grade levels. (I reviewed one of the draft reports at the request of the Consortium.) The findings were unequivocal: Test-based retention did not alter the achievement trajectories of third-graders, and sixth-graders who were retained had lower achievement growth than similar low-achieving students who were promoted. Implementing the eighth-grade promotional gate reduced overall dropout rates slightly, but clearly lowered the likelihood of high school graduation for very low achievers and students who were already overage for grade at the time they reached the gate.
David Herszenhorn, writing in the New York Times at the time, described a Chicago press conference releasing the reports. He quoted Arne Duncan, then the chief executive of the Chicago public schools, as saying, "Common sense tells you that ending social promotion has contributed to higher test scores and lower dropout rates over the last eight years ... I am absolutely convinced in my heart, it's the right thing to do." Herszenhorn delicately noted that Duncan made claims about the promotional policies that were not supported by the two reports. "While the report drew no such conclusion," he wrote, "[Duncan] credited the tough promotion rules for improvements in the system as a whole, including better overall test scores, higher graduation and attendance rates and a lower overall dropout rate."
In the same article, Herszenhorn suggested that NYC Chancellor Joel Klein had "seemed to push aside the findings." He cited a statement by Klein that, "The Chicago study strongly supports our view that effective early grade interventions are key to ending social promotion and preparing students for the hard work they will encounter in later grades." Klein's statement was patently false: the Chicago studies didn't examine early grade interventions. Rather, authors Jenny Nagaoka and Melissa Roderick pointed out that a great many students in Chicago were struggling well before the third-grade promotional gate, suggesting the desirability of early intervention with struggling students.
Yesterday's New York Times story on standardized testing in New York City in the Bloomberg/Klein era isn't the story I would have told. Regular readers are aware that I'm more skeptical about the evidence regarding gains in student learning both in New York City and New York State. And I was especially disappointed that the Times provided a tool for ranking schools, even though the tool provided a modicum of context. As I've written recently, these school comparison tools aren't very informative.
The article did, however, lead me to reflect on something I hadn't considered before—New York City's relative performance on different school subject tests. Elementary and middle school students in New York are tested annually in math, English Language Arts (ELA), science and social studies. Students in grades three through eight take the English Language Arts and math assessments. Science is tested in grades four and eight, whereas social studies is tested in grades five and eight.
We have paid a lot more attention to student performance in ELA and math than we have to student performance in science and social studies. The School Progress Reports accountability system devised by former accountability czar Jim Liebman and implemented in 2006-07 rests heavily on ELA and math test scores, and science and social studies scores have not been taken into account. Elementary and middle schools, their principals and their teachers undoubtedly have gotten the message: how students perform on the ELA and math tests matters; based on the criteria for the Progress Reports, not much else does.
Monday, the Census Bureau released a report on the finances of public elementary and secondary schools in 2007. Such reports lead to a number of common questions: Why is public schooling so expensive? Why is there such a weak relationship between spending and student achievement? If high-spending states and school districts don't outperform lower-spending states and school districts, are we getting our money's worth? These questions are especially pressing in a state such as New York, which, as Yoav Gonen pointed out in yesterday's New York Post, has the highest average per-pupil expenditures among the 50 states and the District of Columbia, but ranks 15th and 23rd among the states on the NAEP fourth-grade and eighth-grade reading tests, respectively.
Three quick points about state-level expenditures. First, expenditures are higher in states with a higher cost of living. The chart below shows that the correlation between state per-pupil expenditures in 2007 and the 2009 cost-of-living index calculated by the Council for Community and Economic Research is .63, a strong association. If we remove Hawaii, which has an unusually high cost of living, the correlation rises to .70.
Nowadays, it seems like anybody with a fast server, some GIS software, and some links to federal and state education databases can put up a website comparing schools. Among the latest entries to the school comparison derby is schooldigger.com, a service of Claarware LLC, billed as "The Web's Easiest and Most Useful K-12 Search and Comparison Tool for Parents." Schooldigger's title evokes the imagery of digging into the interior of schools to see what makes them tick.
The rhetoric on schooldigger's website is typical. The site purports to rank schools within states from best to worst. "Other sites charge over $20 a month for this service!" the site exclaims, but schooldigger does it for free. For New York, the rankings are based on the sum of the average percent proficient in English and math across tested grades. The rankings of schools are aggregated to enable cities and districts to be ranked as well. Schools, cities and districts in the 90th to 100th percentiles of the distribution get five stars; those in the 70th to 90th percentiles get four stars; those in the 50th to 70th percentiles get three stars; the ones in the 30th to 50th percentiles receive two stars; those in the 10th to the 30th percentiles get one star; and those in the bottom 10% of the distribution receive 0 stars.
Sites such as schooldigger may have some interesting bells and whistles, but they can never adequately address the question that I think is of greatest interest to parents: How would my child fare in this school, as compared to another school? If this is, indeed, the question, then school comparison websites are doomed to provide poor and potentially misleading answers.
Former NYC Schools Chancellor Harold O. Levy took to the pages of today's New York Times to tout a five-point plan for fixing American schools. skoolboy couldn't say why the Times saw this as a good use of scarce editorial space—the graphic alone took up a number of column-inches—but there it is. Here's his laundry list:
Raise the age of compulsory education to 19, mandating a year of post-secondary education-perhaps to be paid for by the federal government. One of the issues here is whether an expanded school career should be mandated, or simply encouraged with powerful incentives, such as federal aid for postsecondary schooling. Levy seems confused on this point: He quotes President Obama, in his February address to a joint session of Congress, as saying, "I ask every American to commit to at least one year or more of higher education or career training," and in the next breath describes this as "compulsory post-secondary education." The presidency is a bully pulpit, and many educators were heartened by this strong statement about the importance of schooling. But no one is talking about federal or even state mandates for postsecondary attendance. Let's try to get kids to complete high school with a diploma that signifies some intellectual accomplishment first.
Use high-pressure sales tactics to curb truancy. Levy envisions "making repeated home visits and early morning phone calls, securing written commitments and eliciting oral commitments in front of witnesses" as strategies to "compel parents to ensure that their children go to school every day." The policy remedy that Levy proposes assumes that the main reason that kids don't go to school is because their parents don't "compel" them to. This seems like a misdiagnosis of the cause of the problem. It's more plausible that students don't attend because they don't find what's happening at school meaningful or valuable. An engaging curriculum might well be a much better policy solution than high-pressure sales tactics. If the ultimate goal is to promote student learning, getting a student to the door of the school is only a first step.