It’s good to be King: More Misguided Rhetoric on the NY State Eval System

Posted on December 12, 2012


Very little time to write today, but I must comment on this NY Post article on the bias I’ve been discussing in the NY State te...Sociologist Aaron Pallas of TC and economist Sean Corcoran of NYU express appropriate concerns about the degrees of bias found and reported in the technical report provided by the state’s own consultant developing the models. And this article overall raises concern that these problems were simply blown off. I would/and have put it more bluntly. Here’s my replay of events – quoting the parties involved:

First, the state’s consultants designing their teacher and principal effectiveness measures find that those measures are substantively biased:

Despite the model conditioning on prior year test scores, schools and teachers with students who had higher prior year test scores, on average, had higher MGPs. Teachers of classes with higher percentages of economically disadvantaged students had lower MGPs. (p. 1) http://schoolfinance101.files.wordpress.com/2012/11/growth-model-11...

But instead of questioning their own measures, they decide to give them their blessing and pass them along to the state as being “fair and accurate.”

The model selected to estimate growth scores for New York State provides a fair and accurate method for estimating individual teacher and principal effectiveness based on specific regulatory requirements for a “growth model” in the 2011-2012 school year. p. 40 http://schoolfinance101.files.wordpress.com/2012/11/growth-model-11...

The next step was for the Chancellor to take this misinformation and polish it up as pure spin as part of the power play against the teachers in New York City (who’ve already had the opportunity to scrutinize what is arguably a better but still substantially flawed set of metrics). The Chancellor proclaimed:

The student-growth scores provided by the state for teacher evaluations are adjusted for factors such as students who are English Language Learners, students with disabilities and students living in poverty. When used right, growth data from student assessments provide an objective measurement of student achievement and, by extension, teacher performance.http://www.nypost.com/p/news/opinion/opedcolumnists/for_nyc_student...

Then send in the enforcers…. This statement came from a letter sent to a district that did decide to play ball with the state on the teacher evaluation regulations. The state responded that… sure… you can adopt the system of multiple measures you propose – BUT ONLY AS LONG AS ALL OF THOSE OTHER MEASURES ARE SUFFICIENTLY CORRELATED WITH OUR BIASED MEASURES… AND ONLY AS LONG AS AT LEAST SOMEONE GETS A BAD RATING.

The department will be analyzing data supplied by districts, BOCES and/or schools and may order a corrective action plan if there are unacceptably low correlation results between the student growth subcomponent and any other measure of teacher and principal effectiveness… http://schoolfinance101.wordpress.com/2012/12/05/its-time-to-just-s...

So… what’s my gripe today? Well, in this particular NY Post article we have some rather astounding quotes from NY State Commissioner John King, given the information above. Now, last I talked about John King, he was strutting about NY with ... So, what’s King up to now? Here’s how John King explained the potential bias in the measures and how that bias a) is possibly not bias at all, and b) even if it is, it’s not that big a problem:

“It’s a question of, is this telling you something descriptive about where talent is placed? Or is it telling you something about the classroom effect [or] school effect of concentrations of students?” said King.

“This data alone can’t really answer that question, which is one of the reasons to have multiple measures — so that you have other information to inform your decision-making,” he added. “No one would say we should evaluate educators on growth scores alone. It’s a part of the picture, but it’s not the whole picture.”

So, in King’s view, the bias identified in the AIR technical report might just be a signal as to where the good teachers really are. Kids in schools with lower poverty – kids in schools with higher average starting scores and kids in schools with fewer children with disabilities simply have the better teachers. While there certainly may be some patterned sorting of teachers by their actual effect on test scores a) this proposition is less likely than the expectation of classroom effect and b) making this assumption when not really being able to tease out cause is a highly suspect approach to teacher evaluation (reformy thinking at its finest!).

The kicker is in how King explains why the potential bias isn’t a problem. King argues that the multiple measures approach buffers against over-reliance on the growth percentiles.  As he states so boldly – “it’s part of the picture, but it’s not the whole picture.

The absurdity here is that KING HAS DECLARED TO LOCAL OFFICIALS THAT ALL OTHER MEASURES THEY CHOOSE TO INCLUDE MUST BE SUFFICIENTLY CORRELATED WITH THESE GROWTH PERCENTILE MEASURES!  That’s precisely what the letter quoted above and sent to one local official says! Even this wasn’t the case, the growth percentiles which may wrongly classify teachers for factors outside their control, might carry disproportionate weight in determining teacher ratings (merely as a function of the extent of variation – most of which is noise & much of the remainder is biased).  But, when you require that all other measures be correlated with this suspect measure – you’ve stacked the deck to be substantially if not entirely built on a flawed foundation.

THIS HAS TO STOP. STATE OFFICIALS MUST BE CALLED OUT ON THIS RIDICULOUS CONTORTED/DECEPTIVE & OUTRIGHT DISHONEST RHETORIC!

 

Note: King also tries to play up the fact that at any level of poverty, there are some teachers  getting higher or lower ratings. This explanation ignores the fact that much of the remaining variation in teacher estimates is noise.  Some will get higher or lower ratings in a given year simply because of the noise/instability in the measures. These variations may be entirely meaningless.