Ken Wagner's Posts - School Leadership 2.0

Science cannot be secret!

2008-06-22T16:30:52.000Z

In a recent contribution to the growth/value-added discussion on the DATAG listserv, it was stated that "We invite comment from any of our DATAG Colleagues but are especially interested in the opinions of our resident statisticians." The arc of this discussion is exactly what I had feared - and predicted in my initial posting on May 31.

It runs like this: Growth and value-added models are extremely technical. Here, try to read these extremely technical articles. If you do not understand them, defer to the judgment of the statisticians.

The adoption of a growth or value-added model is a public policy decision, informed no doubt by the recommendations of mathematicians, statisticians, and research scientists, but ultimately made by elected and appointed public officials. Any adoption must be supported by a tax-paying public.

Let me offer an example of why we must always question the opinions of experts. It was stated with authority via the DATAG listserv that "The Rand Corporation, one of the country's most esteemed think tanks, have expressed that Sanders' methodology is the 'most preferred' model for value-added calculations."

The Rand report, in fact, concluded that "Full multivariate analysis of the data is flexible and uses correlation among multiple years of data. This approach is likely to be preferable but is computationally demanding" (p. xvi). The Rand report is endorsing a methodology (i.e., "full multivariate analysis" with "correlation among multiple years of data") not any particular brand (e.g., Sanders or TVAAS).

On page two of the full report (not distributed via the listserv), the Rand authors speak of the Sanders model as "The most prominent implementation of this approach . . ." but acknowledge that other folks have attempted similar solutions (e.g., Webster and colleagues). On page 63 of the report, they state "The primary disadvantage of the multivariate models is extreme computational burden . . . While progress is being made to overcome these computational challenges (Rasbash and Goldstein, 1994; DebRoy and Bates, 2003), widely available and flexible solutions are still lacking."

Further evidence that current solutions were "still lacking," is the fact that the authors of the Rand report came out with their own model a mere year after the "endorsement" referenced in the DATAG posting. (This Rand model can be found in McCaffrey D., Lockwood J.R., Koretz D., Louis T., Hamilton L. (2004), Models for Value-Added Modeling of Teacher Effects, Journal of Educational and Behavioral Statistics, 29(1), 67-101).

Finally, this DATAG statistical authority acknowledged that the Sanders model does have a proprietary element - "The only proprietary part of Sanders' work is his team's solution for seeding the estimation algorithm so that the procedures used for calculating the covariance parameters converge. Otherwise, the computer would likely choke, given the model's utilization of all student test data across subjects, time, and grades."

Although no one seems to like the word "Secret," Sanders and his colleagues are charging a fee for the very same reduction of "computational burden" that the Rand report says is critically necessary. The avoidance of paying this fee is, I assume, at least one reason why the authors of the Rand report came up with their own solution.

There is no doubt that the Sanders model works and can help schools improve. As long as schools understand what they are purchasing, I think the Capital Region BOCES is providing a great service to its customers. But the New York State legislature has mandated the adoption of a growth/value added model. I am not comfortable with the mandatory adoption of any state-wide model that, because of its secret elements, can never be subject to independent replication and verification. I am not alone in my concerns.

Johanna Duncan-Poitier, New York's Senior Deputy Commissioner of Education, stated in her June 23 memo to the Board of Regents -

"The interim growth model should be based on an open architecture; that is the New York State Education Department (NYSED) will publish exactly how it calculates growth decisions and the result will be a single, clear, unambiguous determination of AYP for each English language arts and mathematics accountability criterion" (p. 6).

The Council of Chief State School Officers (CCSSO) stated in their 2005 report, Policymakers' Guide to Growth Models for School Accountability: How do Accountability Models Differ?, that

"Further, due to proprietary estimation procedures, broad applications of this model [Sanders's TVAAS model] independently by states are not possible. Hence cost is an additional factor. Further, using models that contain complex (and proprietary) computations which are inaccessible to stakeholders may make it harder to build consensus and a sense of confidence around the validity of the results" (p. 16).

These are pretty clear denunciations, at both the state and national levels, of large-scale implementations of a "secret" model.

(To be fair, it appears that New York's testing program already uses two proprietary programs - "ITEMWIN" for test item selection and "FLUX" for scoring tables. Perhaps these sneaked under the political radar!)

Finally, this discussion must be anchored in the role that science plays in crafting public policy. A mathematical model is useful only to the extent that it informs decisions. Educational decisions must be subject to empirical verification of equity and efficacy. The scientific method operates via independent and public replication, as well as potential falsification.

Sanders, Saxton, and Horn (1997) declare that "research initiatives are a priority for TVAAS [the Tennessee Value-Added Assessment System]. The enormous, longitudinally merged database . . . is a unique resource for research into educational issues" (p. 141). Indeed, the ability to conduct research on the so-called "teacher effect" - the instructional value added by an individual educator during a specified period of time - is one of the primary justifications for the adoption of a value-added model.

Sanders now sells his model via a company called SAS. Their website features an appealing (or appalling) pitch for Sanders's new model, SAS EVAAS (see http://www.sas.com/govedu/edu/services/effectiveness.html) -

"Schools can benefit from SAS EVAAS analyses without having to invest in new hardware, software or IT staff. Instead, states or districts send electronic data directly to SAS, where the data is cleaned and analyzed. The results are then reported via a secure Web application, a powerful but user-friendly diagnostic tool."

The secrecy at work is buried in a "Pay Us" because you can "Trust Us" marketing campaign. That approach may be convenient, powerful, even helpful, but it is not scientific. That approach can never contribute to science, because secrets can never be publicly replicated or falsified by independent investigators.

Anyone who tells you different is either mistaken - or selling something.

This entry is also posted in the Data Group at http://schoolleadership.ning.com/group/data. Please post your comments in the Data Group so others can follow the discussion!

Accountability Grows Up?

2008-05-31T18:30:00.000Z

The Capital Region BOCES recently sponsored a useful conference entitled Ready, Set, Grow!: School Improvement through Value-Added Analysis. Value-Added is clearly an idea whose time has come, and the conference was co-sponsored by the New York State School Boards Association, the New York State Council of School Superintendents, and the School Administrators Association of New York State. If you looked in the eyes of the attendees - it was a crowded house - there was a mixture of attentiveness and apprehension.

One can think of accountability systems in geometric terms. New York currently calculates Adequate Yearly Progress (AYP) by comparing points: the performance of a group of students at a point in time is compared to a state standard. You're either there (AYP), or you're not.

Chapter 57 of the Laws of 2007 requires New York to have a Growth Model in place by the 2008-09 school year. A growth model is a line with slope. The same group of students is measured at two or more points in time, and the slope will be positive, negative, or zero - signifying growth, regression, or stasis. Districts not yet at AYP may still be able to demonstrate that students are on trend to get to there, soon. Demonstration of growth will allow a more nuanced analysis, and may become a new form of "safe harbor" for districts whose progress is masked by the current point-based system.

This same law requires that New York implement a Value-Added Model by the 2010-11 school year, subject to all sorts of conditions and approvals. With a value-added model, we will have two lines, each with its own slope: one representing expected growth and the other observed growth. A further level of analysis will be possible: schools whose students have not made AYP (the point), nor are on a timely track to reach AYP in the future (the line), may still be able to demonstrate that observed growth has a larger positive slope than would be expected if the district/school/teacher had done nothing.

Crudely put, a value-added model is a statistical method to demonstrate "better than nothing." Schools that are considered under-performing in a point-based AYP model, and not-on-track in a line-based AYP model, may still produce student growth that exceeds expectation (i.e., has value). A value-added model will also identify high-achieving schools that produce no growth, beyond what would be expected had the district/school/teacher done nothing.

In a value-added model, the key issue becomes how to determine expected growth. How do we calculate the slope of the predicted line?

The solution is a tangle of statistics, involving multiple measures, differences, correlations, and covariates, but the basic idea is that future growth is predicted based on past achievement, controlling for student/school characteristics. Some value-added models collect multi-year data and have students serve as their own statistical control; others have fewer repeated measures but use demographic variables as covariates to reduce error.

Which brings me back to the look in the eyes of the conference attendees. The policy implications of these developments are enormous and exciting; the statistics are complex. Some vendors claim the statistics are so complex - and valuable - that they are proprietary, a secret. Thankfully, New York has declared it will not adopt a secret model!

Despite the stamp of approval proffered by legislators, professional organizations, and individual testimonials, there is still a healthy debate within the field about the conditions under which a value-added model is, or is not, valid. The concern is that those who will create, comment on, or implement the policies surrounding a value-added model may not fully understand the underlying technical issues, and will therefore be overly cautious, overly eager, or - worse - will defer key decisions to the recommendations of those who claim to understand. Explaining this to the public is another matter entirely.

We often make decisions based on complex and poorly understood calculations (e.g., stock derivatives). But the mandatory adoption of a value-added model is an example in which students, schools, and teachers will be informed - and judged - in a profoundly different manner. Although this may be a new age of data collection, reporting and analysis, transparency, curiosity, and inclusiveness are timeless virtues when we have so much at stake. The leadership at the State Education Department is in the midst of a healthy debate on growth and value-added models, with an established time-line that includes expert guidance and opportunities for public comment.

Expectation is a funny thing. We know that higher expectations produce better outcomes, so long as the expectation is not so unreasonable as to produce hopelessness and apathy. If expectation is managed properly, students, teachers, and schools tend to meet the challenge. How will value-added models maintain the balance between statistical expectations based on past behavior, and high-but-not-too-high expectations based on future goals?

What do you think about growth and value-added models, and the implications for guiding and evaluating students, teachers, schools, and districts?

This entry is also posted in the Data Group at http://schoolleadership.ning.com/group/data. Please post your comments in the Data Group so others can follow the discussion!

Reflections on the Special Education Snapshot

2008-05-17T22:29:45.000Z

I imagine it will be anti-climatic when the bureaucratic button is finally pushed to close Part I of this year's biggest data drama. The Special Education Snapshot is - or will soon be - done.

For those of you who will not start watching until Season Two, the special education snapshot - or the annual count of all New York State students who receive special education services as of the first school day in December - was reported this year, not through aggregate counts, but via individual student records in the data warehouse.

What must those who had the authority to nix this crazy idea be thinking? Personally, I am glad we moved forward.

We have completed the logistics necessary for the full inclusion of students with disabilities. If all students should have access to the same programs and the same quality of instruction, if they are to be held to the same rigorous and relevant standards, then why shouldn't the oversight of their programs be driven by the same data source? Full inclusion means full participation in the same, if somewhat dysfunctional, system.

This crazy idea forced - sometimes abruptly, sometimes without enough support - those who report the data, those who guide the curriculum and instruction, those who coordinate the special services, and those who worry about the money, to collaborate and cooperate, learn each other's language, and, yes, feel each other's pain. Full inclusiveness should apply to school personnel as well as students.

Finally, this crazy idea will allow the data to get better - eventually, if not today. VESID is wisely not pushing too hard to determine the causes of those 10% year-to-year discrepancies. Although the data was not, and will never be, perfect, forcing aggregate counts to be driven up from individual student records will produce better data, oversight, and (hopefully) outcomes. I have spoken with many on the special education side of the house who are privately grateful for the December Snapshot / Spring cleaning.

Yes, I am tired. Yes, much of this was made up as we went along. Yes, we are not finished. But we can - and should - reflect on the good we have done.

This entry is also posted in the Data Group at http://schoolleadership.ning.com/group/data. Please post your comments in the Data Group so others can follow the discussion!

Teacher Tenure and Student Data

2008-05-08T22:30:00.000Z

Chapter 57 of the Laws of 2007 stipulates that New York State teachers cannot be awarded tenure unless they successfully use student performance data (including, but not limited to, performance on state assessments) to guide their instruction. A 2008 amendment to Chapter 57 prohibits districts from using test scores to evaluate teachers, while maintaining the requirement that teachers use test scores to improve instruction.

School districts must therefore provide teachers with "timely and relevant student data." Districts, schools of education, and other professional development services must help teachers learn how to integrate assessment results into their planning and classroom practices.

This is big. What are districts doing to help teachers meet this tenure requirement and professional obligation? How are teachers meeting this personal challenge?

This entry is also posted in the Data Group at http://schoolleadership.ning.com/group/data. Please post your comments in the Data Group so others can follow the discussion!

Why data?

2008-05-04T03:30:00.000Z

Three changes have affected education in the United States; all involve access.

All children now have the right to access educational services. There are no more places to hide those students who are harder to teach or slower to learn.

All customers of educational services - students, teachers, parents, and other taxpayers - now have a right to access educational data. There are no more places to hide those teachers and those practices that fail to educate.

And, of course, technology has produced unprecedented access to the storage and networked connectivity of data.

In theory, we now have access to the information necessary to answer perennial questions: how do we support the growth of a good teacher; what teaching strategies work and which do not; what works differently for different types of learners and teachers; how much should it all cost.

In practice, turning data into useful and actionable information - rather than a bureaucratic shackle on the minds of students and teachers - will be the primary challenge of this generation of educational leaders.

This is historic. It should also be fun.

This entry is also posted in the Data Group at http://schoolleadership.ning.com/group/data. Please post your comments in the Data Group so others can follow the discussion!