There’s a well-known education research textbook by three distinguished scholars at Harvard entitled By Design. Judy Singer, one of the authors, once told me that the working title for the book, rejected by Harvard University Press, was Bungled by Design. That title conveyed the key message of the book, which is that, when it comes to education research, you can’t fix by analysis what you bungled by design. The design of a research study dictates what a researcher can plausibly ask, and the credibility of the claims about what is being studied.
The recently-released NYU study of the New York City Principal Leadership Academy comparing graduates of the Aspiring Principals Program to other new NYC principals is, in my view, bungled by design. This is not a knock on the authors, each of whom I know and respect a great deal. Rather, it reflects the fact that the NYU researchers were brought in to study the Aspiring Principals Program of the Leadership Academy long after critical design decisions about how to evaluate the impact of the program were made—either by omission or commission.
The three key limitations I raise here pertain to selection mechanisms that ideally would have been observed by the researchers. The inability to understand and model these selection processes undermines the objective of isolating the effect of the Aspiring Principals Program on student outcomes. (See the comments of Sean Corcoran, lead author of the report, on selection issues here.)
Let’s start with selection into the program itself. We know precious little about either the individuals selected into the Aspiring Principals Program or those who self-select into becoming principals in the comparison group. To be sure, they can be compared on race, age, years of teaching experience, and a couple of other variables, but there’s no information about the personal qualities of the APP and comparison principals—and these personal qualities might be relevant to their subsequent success as principals. This concern is heightened by the fact that the Aspiring Principals Program is highly selective. For example, the 2003 cohort of Aspiring Principals consisted of 90 individuals culled from 400 applications; the number of applications ballooned to 1,200 in 2004, which means that fewer than one in 10 applicants was selected for the program. The analysis cannot even rule out the uncomfortable possibility that some of the members of the comparison principal group applied to the APP and were rejected because the program administrators predicted that they would be unsuccessful. There’s really no way to establish that the APP and comparison principal groups were equivalent on things that might matter for their success at the time that the groups were selected.
Welcome to Chalkbeat
Chalkbeat is an independent nonprofit news organization telling the story of education in America. Learn more.
Education news. In your inbox. Sign up for our email newsletter
Education news. In your inbox. Sign up for our email newsletter
Next, there’s the selection of the schools that APP and comparison principals were chosen to lead. The NYU report documents that APP principals were placed in different kinds of schools than comparison principals—schools that were smaller, lower-performing, and on a downward trajectory, with higher concentrations of Black students, and more likely to be located in the Bronx. Other than the stated mission of the Leadership Academy to place APP principals in hard-to-staff schools, we have little to go on to document the process of assigning principals to schools. The fact that APP and comparison principals wound up in different kinds of schools greatly complicates an understanding of the impact of the program, because there is a risk that the differences in outcomes which are observed are as much a function of the features of the schools in which principals were placed as the changes in principals’ behaviors attributable to exposure to the APP. The NYU researchers do the best they can to address this by looking at the performance trajectories of schools pre- and post-arrival of the APP and comparison principals, but there are many unanswered questions. Suppose, for example, that a school was on a downward trajectory for two years before an APP principal took over. Might we expect such a school to trend upward just by chance, given what we know about year-to-year fluctuations in school-level test performance? If so, the slight advances observed in schools led by APP graduates relative to comparison principals might be due to differences in the schools they led, and not to the impact of the APP. The question here is whether there is a sufficient number of schools led by APP graduates and schools led by comparison principals with similar trajectories prior to the arrival of the new principal to rule out this possibility.
And third, there’s the issue of selection out of both the APP group and the comparison principal group—what is often referred to as sample attrition. Of the 147 graduates of the APP in the 2004 and 2005 cohorts (82% of the approximately 180 entrants to the program in those two years), 120 served as a principal for some time in a DOE school. Of these, 15 switched schools, a few served as a principal and transferred to another DOE position, a couple were promoted, and some left the DOE after serving as a principal. There are 86 APP graduates of the original 147, or 59% of the graduates, included in the analysis.
The study does not address the attrition either from the APP group or from the comparison group. (We know next to nothing about the comparison principals who started and didn’t persist in the same school for three years, and thus were vanquished from the study.) Were the 34 APP graduates who started as principals but didn’t meet the three-year tenure requirement of the study unsuccessful and counseled out (or kicked upstairs)? How did they differ from the 86 APP graduates who represent the core sample analyzed in the study? Understanding the impact of a program requires an understanding of who leaves the “treatment” group as well as who leaves the comparison group, and why.
Finally, I can’t end this post without commenting on the limits of assessing the Aspiring Principals Program primarily on the basis of the state test scores achieved by students (for the elementary/middle schools). The fact that such test scores are often available—although never for students below grade 3!—and are at the center of the city’s accountability system does not justify the decision to exclude virtually every other measure of principal performance that might be relevant. Does the principal support students’ social and emotional development? Or preparation for citizenship in our complex democracy? Does s/he support the teachers’ learning, and ambitious teaching practices? Does s/he promote collaborative problem-solving among the staff and stakeholders of the school? Does s/he manage resources efficiently to support the instructional mission of the school? Does s/he act with integrity? These are just some of the desirable features of a skilled principal that the NYU evaluation of the Aspiring Principals Program was unable to address. Through no fault of the researchers.
And finally-finally: Although clearly beyond the purview of the NYU project, I am hoping that someone is engaged in a cost-effectiveness analysis of the Aspiring Principals Program. There’s considerable attrition at various stages up to the desired outcome of a sustained stint as an NYC principal, and the results obtained in the NYU study do not show that students in schools led by an APP graduate have much better outcomes than students in schools led by comparison principals. A careful cost-effectiveness analysis would juxtapose the student outcomes observed in the two groups with the cost of preparing individuals in the two groups. There may well be direct and indirect costs associated with preparing principals who do not go through the APP. We know that the costs of taking an individual through the APP are substantial, perhaps in excess of $150,000. Do the small and mixed differences in student outcomes favoring APP graduates observed in the NYU study justify investments of this magnitude?