First Person

Slow and steady best on revamped evaluations

Kristin Klopfenstein is the executive director of the Education Innovation Institute at the University of Northern Colorado.

I’m often struck by the potential for progress – and for detriment – in the national movement to tie educator evaluations to student performance data. Evaluations should be the impetus for ongoing conversations and activities that lead teachers and principals to improve. Instead, unfortunately, they often become mechanical compliance exercises that can easily become punitive.

Anyone who advocates basing some portion of a teacher’s job evaluations on student performance is bound to have been sobered by early reports from some cities and states that are well along in the process of designing and rolling out such approaches. Several recent news stories from places like Chicago, Tennessee and New York reveal myriad concerns, ranging from worries by teachers about fair application of the new criteria to frustrations by principals about inadequate training, lack of confidence in the reliability of test scores and cascades of rules that reduce them to process-driven grinds.

Another theme in these stories is that some jurisdictions apparently rushed to put these complex, radically different evaluation systems in place without testing them adequately or making sure that people who would be most affected understood the new criteria. All of these factors decrease the likelihood that student growth-based evaluation systems will, in practice, empower educators or improve student achievement.

Working in concert with teachers is the best approach

Resistance to change isn’t surprising. Major change is scary, and these changes could force educators to rethink expectations about their livelihoods and professional identities. History also explains some of the reactions. Too often, accountability and other reforms have been done to teachers instead of in concert with them in a shared effort to improve instruction and learning.

One thing that struck me about these stories was that principals were often as outspoken as teachers, which is unusual. “Principals don’t revolt,” says one principal quoted in a New York Times story about opposition to the use of student test scores in teacher and principal evaluations.

Against this backdrop, the Colorado Legacy Foundation has produced some documents and guides to help districts that are ready to start building evaluation models for SB 10-191, Colorado’s educator effectiveness bill, avoid some major landmines.

What I like best about these guides is that they are based on the experiences of three districts – Brighton, Eagle, and Harrison – that overhauled their evaluation systems before 191 was on the books. The guides and case studies aren’t blueprints; superintendents and boards will have to go to the districts to get enough detail to understand how the systems work.

But they do offer solid advice born from experience that could raise the odds for buy-in. Nor do the guides answer some basic questions such as whether and how much the three districts will have to adapt their hard-won programs to work with 191. What they do offer is reassurance that peers have jumped off this ledge and survived. The three systems differ from one another, giving readers a range of options to consider. But in all three it is clear that evaluations became a more central and more frequent activity for both teachers and principals.

Learning from early adopters’ mistakes

One appealing aspect of these documents is that they are fairly candid about mistakes districts made. For example, Eagle heavily revised its system after educators complained that the model didn’t work well for teachers whose subjects weren’t covered by standardized tests and that the algorithms driving the plan were not explained clearly.

The documents offer several take-away lessons such as the importance of involving stakeholders early and often, making sure teachers understand how the program works, and building systems that not only evaluate performance but support teachers while they work to improve.

Any complex new approach to something as closely tied to people’s sense of self-worth as a job evaluation demands careful, thoughtful, collaborative planning and testing. Along these lines, we must ensure that the intent of SB-191 — to facilitate the conversations and collaboration among teachers and administrators that lead to improved student achievement — survives whatever happens next.

If SB-191 becomes more about compliance and paper shuffling than about teacher and leader development, the experiment will have failed in Colorado. At this point, the legislation and rules for SB-191 are only words. It is now up to the state and the districts to put meat on the bones of 191 as a system that helps schools create a collaborative professional climate and not just another top-down compliance checklist.

Too much focus on process runs the risk of letting people avoid digging into difficult tasks, such as thoughtful, well-informed conversations about ways to keep growing and improving – conversations that even the most accomplished professionals need.

On the other hand, full implementation may be slowed while everyone waits for the final appellate ruling on the Lobato case , and that may buy more time for careful preparation.

First Person

I’ve spent years studying the link between SHSAT scores and student success. The test doesn’t tell you as much as you might think.

PHOTO: Photo by Robert Nickelsberg/Getty Images

Proponents of New York City’s specialized high school exam, the test the mayor wants to scrap in favor of a new admissions system, defend it as meritocratic. Opponents contend that when used without consideration of school grades or other factors, it’s an inappropriate metric.

One thing that’s been clear for decades about the exam, now used to admit students to eight top high schools, is that it matters a great deal.

Students admitted may not only receive a superior education, but also access to elite colleges and eventually to better employment. That system has also led to an under-representation of Hispanic students, black students, and girls.

As a doctoral student at The Graduate Center of the City University of New York in 2015, and in the years after I received my Ph.D., I have tried to understand how meritocratic the process really is.

First, that requires defining merit. Only New York City defines it as the score on a single test — other cities’ selective high schools use multiple measures, as do top colleges. There are certainly other potential criteria, such as artistic achievement or citizenship.

However, when merit is defined as achievement in school, the question of whether the test is meritocratic is an empirical question that can be answered with data.

To do that, I used SHSAT scores for nearly 28,000 students and school grades for all public school students in the city. (To be clear, the city changed the SHSAT itself somewhat last year; my analysis used scores on the earlier version.)

My analysis makes clear that the SHSAT does measure an ability that contributes to some extent to success in high school. Specifically, a SHSAT score predicts 20 percent of the variability in freshman grade-point average among all public school students who took the exam. Students with extremely high SHSAT scores (greater than 650) generally also had high grades when they reached a specialized school.

However, for the vast majority of students who were admitted with lower SHSAT scores, from 486 to 600, freshman grade point averages ranged widely — from around 50 to 100. That indicates that the SHSAT was a very imprecise predictor of future success for students who scored near the cutoffs.

Course grades earned in the seventh grade, in contrast, predicted 44 percent of the variability in freshman year grades, making it a far better admissions criterion than SHSAT score, at least for students near the score cutoffs.

It’s not surprising that a standardized test does not predict as well as past school performance. The SHSAT represents a two and a half hour sample of a limited range of skills and knowledge. In contrast, middle-school grades reflect a full year of student performance across the full range of academic subjects.

Furthermore, an exam which relies almost exclusively on one method of assessment, multiple choice questions, may fail to measure abilities that are revealed by the variety of assessment methods that go into course grades. Additionally, middle school grades may capture something important that the SHSAT fails to capture: long-term motivation.

Based on his current plan, Mayor de Blasio seems to be pointed in the right direction. His focus on middle school grades and the Discovery Program, which admits students with scores below the cutoff, is well supported by the data.

In the cohort I looked at, five of the eight schools admitted some students with scores below the cutoff. The sample sizes were too small at four of them to make meaningful comparisons with regularly admitted students. But at Brooklyn Technical High School, the performance of the 35 Discovery Program students was equal to that of other students. Freshman year grade point averages for the two groups were essentially identical: 86.6 versus 86.7.

My research leads me to believe that it might be reasonable to admit a certain percentage of the students with extremely high SHSAT scores — over 600, where the exam is a good predictor —and admit the remainder using a combined index of seventh grade GPA and SHSAT scores.

When I used that formula to simulate admissions, diversity increased, somewhat. An additional 40 black students, 209 Hispanic students, and 205 white students would have been admitted, as well as an additional 716 girls. It’s worth pointing out that in my simulation, Asian students would still constitute the largest segment of students (49 percent) and would be admitted in numbers far exceeding their proportion of applicants.

Because middle school grades are better than test scores at predicting high school achievement, their use in the admissions process should not in any way dilute the quality of the admitted class, and could not be seen as discriminating against Asian students.

The success of the Discovery students should allay some of the concerns about the ability of students with SHSAT scores below the cutoffs. There is no guarantee that similar results would be achieved in an expanded Discovery Program. But this finding certainly warrants larger-scale trials.

With consideration of additional criteria, it may be possible to select a group of students who will be more representative of the community the school system serves — and the pool of students who apply — without sacrificing the quality for which New York City’s specialized high schools are so justifiably famous.

Jon Taylor is a research analyst at Hunter College analyzing student success and retention. 

First Person

With roots in Cuba and Spain, Newark student came to America to ‘shine bright’

PHOTO: Patrick Wall
Layla Gonzalez

This is my story of how we came to America and why.

I am from Mallorca, Spain. I am also from Cuba, because of my dad. My dad is from Cuba and my grandmother, grandfather, uncle, aunt, and so on. That is what makes our family special — we are different.

We came to America when my sister and I were little girls. My sister was three and I was one.

The first reason why we came here to America was for a better life. My parents wanted to raise us in a better place. We also came for better jobs and better pay so we can keep this family together.

We also came here to have more opportunities — they do call this country the “Land Of Opportunities.” We came to make our dreams come true.

In addition, my family and I came to America for adventure. We came to discover new things, to be ourselves, and to be free.

Moreover, we also came here to learn new things like English. When we came here we didn’t know any English at all. It was really hard to learn a language that we didn’t know, but we learned.

Thank God that my sister and I learned quickly so we can go to school. I had a lot of fun learning and throughout the years we do learn something new each day. My sister and I got smarter and smarter and we made our family proud.

When my sister Amira and I first walked into Hawkins Street School I had the feeling that we were going to be well taught.

We have always been taught by the best even when we don’t realize. Like in the times when we think we are in trouble because our parents are mad. Well we are not in trouble, they are just trying to teach us something so that we don’t make the same mistake.

And that is why we are here to learn something new each day.

Sometimes I feel like I belong here and that I will be alright. Because this is the land where you can feel free to trust your first instinct and to be who you want to be and smile bright and look up and say, “Thank you.”

As you can see, this is why we came to America and why we can shine bright.

Layla Gonzalez is a fourth-grader at Hawkins Street School. This essay is adapted from “The Hispanic American Dreams of Hawkins Street School,” a self-published book by the school’s students and staff that was compiled by teacher Ana Couto.