First Person

Reasonable Doubt

I’ve been relatively quiet in the ongoing debate about how best to evaluate teachers in New York City and across New York State. I’m not close to the negotiations and can claim no expertise on the political machinations outside of public view. At its heart, this seems to me a dispute over jurisdiction: Who has the legitimate authority to regulate the work of an occupation that seeks the status of a profession—but one that is in a labor-management relationship?

The laws of New York recognize the labor-management fault line, but they do little to guide a collective-bargaining process toward agreements in the many districts in which teacher-evaluation systems are contested. Each side brings a powerful public value to bear on the disagreement.

For the employers, it’s all about efficiency. It’s in the public interest, they argue, to recruit, retain and reward the best teachers, in order to maximize the collective achievement of students. A teacher-evaluation system that fails to identify those teachers who are effective, and those who are ineffective, can neither weed out consistent low-performers nor target those who might best benefit from intensive help. Rewarding high-performing teachers can, in the short run, help keep them in their classrooms, they claim, and, in the long run, can help expand the pool of talented individuals who enter the occupation.

For teachers, the key concern is fairness. Fairness is primarily a procedural issue: Teachers, and the unions that represent them, seek an evaluation process that is neither arbitrary nor capricious, relying on stable and valid criteria that they believe accurately characterize the quality of their work. In this view, an evaluation process is unfair to the extent that it can be manipulated by a building administrator or school district to yield a particular rating for a teacher’s performance. It is also unfair if random factors beyond a teacher’s control unduly influence the evaluation of his or her performance.

The values of efficiency and fairness collide head-on in New York’s Education Law §3012-c, passed as part of the state’s efforts to bolster its chances in the 2010 Race to the Top competition. The law requires annual professional performance reviews (APPRs) that sort teachers into four categories—“highly effective,” “effective,” “developing” and “ineffective”—based on multiple measures of effectiveness, including student growth on state and locally selected assessments and a teacher’s performance according to a teacher practice rubric.

The fundamental problem is that it’s hard to assess the efficiency or fairness of an evaluation system that doesn’t exist yet. There are too many unknowns to be able to judge, which is one of the arguments for piloting an evaluation system before bringing it to scale. The properties of the state tests that are to be used to assess teachers’ contributions to student learning are a moving target; the tests have been changing in recent years in response to concerns about their difficulty, predictability and coverage of state curricular standards. And in a couple of years, those standards and assessments will change, as New York and many other states phase in the Common Core standards and new assessments designed to measure mastery of them. The models to estimate a teacher’s position relative to other teachers in contributing to students’ test performance are imprecise at the level of the individual teacher, and different models yield different results for a given teacher. There’s been little to no discussion of how to incorporate this uncertainty into the single numerical score a teacher will receive.

The evaluation of teachers’ practices via classroom observations using New York State Education Department (NYSED)-approved rubrics, such as Charlotte Danielson’s Framework for Teaching or Robert Pianta’s Classroom Assessment Scoring System, is another unknown. There’s evidence that with proper training, observers can reliably rate teachers’ classroom practices, but the nature of the training is critical, and there is no evidence to date of New York City’s ability to prepare more than 1,500 principals, or the principals’ “designees,” to carry out multiple observations of many teachers, teaching many different school subjects, each year.

Amazingly, there is even uncertainty about whether the evaluations can or should be based solely on a teacher’s performance in a single year. The statute creating the new evaluation system in New York describes it as an “annual professional performance review.” But is this a professional performance review that occurs annually, or a review of annual professional performance—that is, a teacher’s performance in the most recent year? The guidance provided by the NYSED suggests that it has no idea. “For 2011-12, only one year of teacher or principal student growth percentile scores will factor into each educator’s evaluation,” the guidance states. “When more years of data are available, NYSED will consider whether each evaluation year should include more than one year of educator student growth results. Empirical and policy considerations will determine the decision.”

Well, that certainly clarifies matters. In other words, a “bad” year where a teacher is ranked relatively low compared to other teachers might reverberate, affecting his or her ranking in subsequent years. But a good observational rating in a given year seemingly will have no spillover effect into subsequent years. If, as has been true in Washington, D.C.’s IMPACT teacher-evaluation system, teachers generally score higher on observational ratings than on their value-added or growth-score rankings relative to other teachers, the carryover for value-added performance—but not observations of teachers’ professional practices—appears unfair. And in D.C., this evaluation system has resulted in the termination of hundreds of teachers based on one or two years of performance.

Teacher-evaluation systems have multiple purposes, which might include certifying teachers as competent or selecting some for particular forms of professional development to enhance their professional practice. For most of these purposes, it’s essential that those with a stake in the education system view these evaluation systems as legitimate—and the perceived efficiency and fairness of an evaluation system are central to such judgments. It’s not hard to see why a great many teachers, in New York City and across the state, have serious doubts about the fairness of New York State’s APPR process. And if future teachers do as well, the process could have the unintended consequence of reducing, rather than increasing, the pool of individuals willing to consider teaching as a vocation. This, coupled with the more than 1,300 principals across the state who have raised questions about the efficiency of the process, illuminates the challenges confronting the state as it seeks to implement the APPR system and avoid a scolding from U.S. Secretary of Education Arne Duncan.

William Blackstone, an 18th-century English legal scholar, wrote “better that ten guilty persons escape than that one innocent suffer.” Benjamin Franklin, one of the founders of our country, later upped the ante to 100 to one. The principle captures squarely the trade-off between the value of efficiency and the value of fairness. A legal system that lets the guilty go free is inefficient, as these offenders are free to continue to transgress against the common good. But to Franklin and others, that was still preferable to a legal system that did not provide adequate procedural protections for all, whether innocent or guilty, because such a system would be inconsistent with the principle of fairness so central to the American polity.

It’s important to note that Blackstone and Franklin were concerned with the workings of government; fairness in the private sector was not a central concern, and efficiency was taken for granted as a consequence of market forces. Civil servants, as agents and employees of the state, arguably are subject to a different set of rights and responsibilities than those working in the private sector, and teachers are one of the largest groups of such public servants. What’s an acceptable tradeoff between efficiency and fairness in the mix of teachers’ rights and responsibilities? It’s a lot easier to speculate about percentages in the abstract than to confront the possibility that you, or someone close to you, might be out of a job because of an untested teacher-evaluation system that cuts corners on fairness.

This post also appears on Eye on Education, Aaron Pallas’s Hechinger Report blog.

First Person

I’m a principal who thinks personalized learning shouldn’t be a debate.

PHOTO: Lisa Epstein
Lisa Epstein, principal of Richard H. Lee Elementary, supports personalized learning

This is the first in what we hope will be a tradition of thoughtful opinion pieces—of all viewpoints—published by Chalkbeat Chicago. Have an idea? Send it to cburke@chalkbeat.org

As personalized learning takes hold throughout the city, Chicago teachers are wondering why a term so appealing has drawn so much criticism.

Until a few years ago, the school that I lead, Richard H. Lee Elementary on the Southwest Side, was on a path toward failing far too many of our students. We crafted curriculum and identified interventions to address gaps in achievement and the shifting sands of accountability. Our teachers were hardworking and committed. But our work seemed woefully disconnected from the demands we knew our students would face once they made the leap to postsecondary education.

We worried that our students were ill-equipped for today’s world of work and tomorrow’s jobs. Yet, we taught using the same model through which we’d been taught: textbook-based direct instruction.

How could we expect our learners to apply new knowledge to evolving facts, without creating opportunities for exploration? Where would they learn to chart their own paths, if we didn’t allow for agency at school? Why should our students engage with content that was disconnected from their experiences, values, and community?

We’ve read articles about a debate over personalized learning centered on Silicon Valley’s “takeover” of our schools. We hear that Trojan Horse technologies are coming for our jobs. But in our school, personalized learning has meant developing lessons informed by the cultural heritage and interests of our students. It has meant providing opportunities to pursue independent projects, and differentiating curriculum, instruction, and assessment to enable our students to progress at their own pace. It has reflected a paradigm shift that is bottom-up and teacher led.

And in a move that might have once seemed incomprehensible, it has meant getting rid of textbooks altogether. We’re not alone.

We are among hundreds of Chicago educators who would welcome critics to visit one of the 120 city schools implementing new models for learning – with and without technology. Because, as it turns out, Chicago is fast becoming a hub for personalized learning. And, it is no coincidence that our academic growth rates are also among the highest in the nation.

Before personalized learning, we designed our classrooms around the educator. Decisions were made based on how educators preferred to teach, where they wanted students to sit, and what subjects they wanted to cover.

Personalized learning looks different in every classroom, but the common thread is that we now make decisions looking at the student. We ask them how they learn best and what subjects strike their passions. We use small group instruction and individual coaching sessions to provide each student with lesson plans tailored to their needs and strengths. We’re reimagining how we use physical space, and the layout of our classrooms. We worry less about students talking with their friends; instead, we ask whether collaboration and socialization will help them learn.

Our emphasis on growth shows in the way students approach each school day. I have, for example, developed a mentorship relationship with one of our middle school students who, despite being diligent and bright, always ended the year with average grades. Last year, when she entered our personalized learning program for eighth grade, I saw her outlook change. She was determined to finish the year with all As.

More than that, she was determined to show that she could master anything her teachers put in front of her. She started coming to me with graded assignments. We’d talk about where she could improve and what skills she should focus on. She was pragmatic about challenges and so proud of her successes. At the end of the year she finished with straight As—and she still wanted more. She wanted to get A-pluses next year. Her outlook had changed from one of complacence to one oriented towards growth.

Rather than undermining the potential of great teachers, personalized learning is creating opportunities for collaboration as teachers band together to leverage team-teaching and capitalize on their strengths and passions. For some classrooms, this means offering units and lessons based on the interests and backgrounds of the class. For a couple of classrooms, it meant literally knocking down walls to combine classes from multiple grade-levels into a single room that offers each student maximum choice over how they learn. For every classroom, it means allowing students to work at their own pace, because teaching to the middle will always fail to push some while leaving others behind.

For many teachers, this change sounded daunting at first. For years, I watched one of my teachers – a woman who thrives off of structure and runs a tight ship – become less and less engaged in her profession. By the time we made the switch to personalized learning, I thought she might be done. We were both worried about whether she would be able to adjust to the flexibility of the new model. But she devised a way to maintain order in her classroom while still providing autonomy. She’s found that trusting students with the responsibility to be engaged and efficient is both more effective and far more rewarding than trying to force them into their roles. She now says that she would never go back to the traditional classroom structure, and has rediscovered her love for teaching. The difference is night and day.

The biggest change, though, is in the relationships between students and teachers. Gone is the traditional, authority-to-subordinate dynamic; instead, students see their teachers as mentors with whom they have a unique and individual connection, separate from the rest of the class. Students are actively involved in designing their learning plans, and are constantly challenged to articulate the skills they want to build and the steps that they must take to get there. They look up to their teachers, they respect their teachers, and, perhaps most important, they know their teachers respect them.

Along the way, we’ve found that students respond favorably when adults treat them as individuals. When teachers make important decisions for them, they see learning as a passive exercise. But, when you make it clear that their needs and opinions will shape each school day, they become invested in the outcome.

As our students take ownership over their learning, they earn autonomy, which means they know their teachers trust them. They see growth as the goal, so they no longer finish assignments just to be done; they finish assignments to get better. And it shows in their attendance rates – and test scores.

Lisa Epstein is the principal of Richard H. Lee Elementary School, a public school in Chicago’s West Lawn neighborhood serving 860 students from pre-kindergarten through eighth grade.

Editor’s note: This story has been updated to reflect that Richard H. Lee Elementary School serves 860 students, not 760 students.

First Person

I’ve spent years studying the link between SHSAT scores and student success. The test doesn’t tell you as much as you might think.

PHOTO: Photo by Robert Nickelsberg/Getty Images

Proponents of New York City’s specialized high school exam, the test the mayor wants to scrap in favor of a new admissions system, defend it as meritocratic. Opponents contend that when used without consideration of school grades or other factors, it’s an inappropriate metric.

One thing that’s been clear for decades about the exam, now used to admit students to eight top high schools, is that it matters a great deal.

Students admitted may not only receive a superior education, but also access to elite colleges and eventually to better employment. That system has also led to an under-representation of Hispanic students, black students, and girls.

As a doctoral student at The Graduate Center of the City University of New York in 2015, and in the years after I received my Ph.D., I have tried to understand how meritocratic the process really is.

First, that requires defining merit. Only New York City defines it as the score on a single test — other cities’ selective high schools use multiple measures, as do top colleges. There are certainly other potential criteria, such as artistic achievement or citizenship.

However, when merit is defined as achievement in school, the question of whether the test is meritocratic is an empirical question that can be answered with data.

To do that, I used SHSAT scores for nearly 28,000 students and school grades for all public school students in the city. (To be clear, the city changed the SHSAT itself somewhat last year; my analysis used scores on the earlier version.)

My analysis makes clear that the SHSAT does measure an ability that contributes to some extent to success in high school. Specifically, a SHSAT score predicts 20 percent of the variability in freshman grade-point average among all public school students who took the exam. Students with extremely high SHSAT scores (greater than 650) generally also had high grades when they reached a specialized school.

However, for the vast majority of students who were admitted with lower SHSAT scores, from 486 to 600, freshman grade point averages ranged widely — from around 50 to 100. That indicates that the SHSAT was a very imprecise predictor of future success for students who scored near the cutoffs.

Course grades earned in the seventh grade, in contrast, predicted 44 percent of the variability in freshman year grades, making it a far better admissions criterion than SHSAT score, at least for students near the score cutoffs.

It’s not surprising that a standardized test does not predict as well as past school performance. The SHSAT represents a two and a half hour sample of a limited range of skills and knowledge. In contrast, middle-school grades reflect a full year of student performance across the full range of academic subjects.

Furthermore, an exam which relies almost exclusively on one method of assessment, multiple choice questions, may fail to measure abilities that are revealed by the variety of assessment methods that go into course grades. Additionally, middle school grades may capture something important that the SHSAT fails to capture: long-term motivation.

Based on his current plan, Mayor de Blasio seems to be pointed in the right direction. His focus on middle school grades and the Discovery Program, which admits students with scores below the cutoff, is well supported by the data.

In the cohort I looked at, five of the eight schools admitted some students with scores below the cutoff. The sample sizes were too small at four of them to make meaningful comparisons with regularly admitted students. But at Brooklyn Technical High School, the performance of the 35 Discovery Program students was equal to that of other students. Freshman year grade point averages for the two groups were essentially identical: 86.6 versus 86.7.

My research leads me to believe that it might be reasonable to admit a certain percentage of the students with extremely high SHSAT scores — over 600, where the exam is a good predictor —and admit the remainder using a combined index of seventh grade GPA and SHSAT scores.

When I used that formula to simulate admissions, diversity increased, somewhat. An additional 40 black students, 209 Hispanic students, and 205 white students would have been admitted, as well as an additional 716 girls. It’s worth pointing out that in my simulation, Asian students would still constitute the largest segment of students (49 percent) and would be admitted in numbers far exceeding their proportion of applicants.

Because middle school grades are better than test scores at predicting high school achievement, their use in the admissions process should not in any way dilute the quality of the admitted class, and could not be seen as discriminating against Asian students.

The success of the Discovery students should allay some of the concerns about the ability of students with SHSAT scores below the cutoffs. There is no guarantee that similar results would be achieved in an expanded Discovery Program. But this finding certainly warrants larger-scale trials.

With consideration of additional criteria, it may be possible to select a group of students who will be more representative of the community the school system serves — and the pool of students who apply — without sacrificing the quality for which New York City’s specialized high schools are so justifiably famous.

Jon Taylor is a research analyst at Hunter College analyzing student success and retention.