Testing, Testing 1-2-3 from Education Next by Anne Hyslop

I. In Defense of Standardized Testing

According to a Gallup poll last fall, one in eight teachers thinks that the worst thing about the Common Core is testing. On the surface, that’s hardly newsworthy. We know states are changing their tests to align to the new standards, and those changes have inevitably breduncertainty, anxiety, and even hostility, especially when results could carry high stakessomeday. But educators surveyed didn’t say they were upset that the tests were changing, or that there could be consequences tied to the results. Rather, they were upset that the tests exist. Specifically, 12 percent of U.S. public school teachers “don’t believe in standardized testing.” Much like the debate over global warming, these non-believers refuse to validate an unassailable fact: standardized testing does have positive– and predictive–value in education and in life, just as the Earth is, indeed, getting warmer.

More specifically, this righteous conviction—“I don’t believe in testing”—is at odds with most policy analysis. Regardless of political or ideological bent, most will admit that NCLB got one thing right: exposing achievement gaps through the disaggregation of student data. Where did that data come from? Standardized tests. Instead of ignoring longstanding disparities in schooling, NCLB’s testing regimen forced states and districts to quantify them, examine them, and most importantly, try to improve them. It gave policymakers, administrators, and educators a common language to talk about student achievement and progress, and evaluate what was working based on evidence, not perception. Sure, standardized testing needed to be refined over the last decade to enhance quality andreduce unintended consequences—and could still use upgrades and be open to furtherinnovation. But the value of standardized testing in terms of better understanding and improving a public education system as vast and fragmented as ours is undeniable, right?

Well, most people aren’t policy analysts. And today, growing segments of the education sector—not just teachers, but also their unions, lawmakers, parents, and prominentresearchers and advocacy groups—seem to be forgetting (or willfully ignoring) the value of statewide standardized testing. Every week there is another proposal for “new accountability” or a plan to stop “over-testing” students. Go beyond the boisterous press releases or slick websites, though, and these plans are feeding on a far more negative undercurrent: NCLB’s requirement for statewide annual standardized tests for all kids is harmful and wrong. And that undercurrent has grown stronger in recent weeks as reports have surfaced that the new chairman of the Senate education committee, Lamar Alexander (R-TN), seems ready to introduce a bill to reauthorize NCLB that would eliminate the annual testing requirement in grades 3-8.

Now, it’s one thing to dislike standardized testing or point out its flaws. It’s an entirely different matter to refuse to believe in it, to claim that it provides no information of any value. And with teachers, parents, advocates, and policymakers on both sides of the aisle losing faith in statewide annual standardized testing—refusing to see these measurements of teaching and student learning as anything but unreliable, worthless, or biased—education reform is coming to a crossroads. One path is dominated by these non-believers. On it, “subjective perception and experience become the sole arbiter of truth,” as my colleague Sara Mead wrote, and “we are left with the… forces of emotion, sentiment, and affinity to guide our judgments and decisions.”

If this is the only way forward, education reform is imperiled. Not because Democrats arecannibalizing one another, or because the Tea Party is holding the Republican partyhostage. But because when objective, common measures of students’ learning—like standardized tests—are only to be mistrusted, our ability to work out our differences to set education policy based on research and evidence, implement programs effectively, and make judgments in the best interests of all kids is lost. “I don’t believe you. That’s not whatmy experience tells me is true.” Compromise becomes harder, gridlock and discord more permanent.

So in what might be a lonely fool’s errand, I’m going to attempt to find—and argue for—another path. A path that may not blindingly believe in statewide standardized testing, but at least recognizes its value in measuring school, educator, and student learning in ways that inform individual, collective, and comparative judgments about performance and progress. It may not be popular or politically expedient, but annual, statewide standardized testing, and yes–even accountability and consequences based on that data–at least deserve a defense.

*****

II. Five Reasons Getting Rid of Annual Testing is a Bad Idea

Senator Lamar Alexander (R-TN) and Rep. John Kline (R-MN), the incoming leaders of the Senate and House education committees, both say they are open to an ESEA rewrite that kills the requirement for states to test students annually. Or as I called it, the peel off the party wings approach to reauthorization. This bipartisan coalition bonds over their hatred of statewide annual testing, but not much else. And any bill they produce would be, in essence, a giant finger to the policies of Arne Duncan and Barack Obama–and Margaret Spellings and George W. Bush before them.

Like Mike Petrilli in this Flypaper post, I hope Alexander’s and Kline’s annual testing one-eighty is all just a bluff to try and get Democrats to give in on requiring states to develop teacher evaluations. And I hope they come to their senses and reveal a more centristreauthorization proposal–with annual statewide testing, and data reporting, and school accountability requirements with teeth.

Because getting rid of annual testing is a dumb idea. I acknowledge (readily) that there are very real problems with today’s tests, accountability systems, teacher evaluations, NCLB waivers, and so on. And these problems are often most acute for those most affected by them–students, families, and teachers, rather than the policymakers that wrote the law and are now responsible for updating it.

But this particular reaction–ending statewide, comparable, annual testing–is an overreaction that creates more problems than it solves. It feeds into the false narrative that testing is only able to punish, rather than inform, support, and motivate. It makes it okay that we haven’t invested nearly enough in building educator capacity to support the students that tests identify as struggling, including significant commitments to overhauling both professional development and teacher preparation. It shies away from, rather than confronts, the hard truths that tests reveal about our education system–the disparate outcomes, and disparate expectations of what students from different backgrounds, ethnicities, and socio-economic conditions can learn.

Still, given the public beating standardized tests have taken over the last decade, and thenegative narrative around testing that’s solidified as a result, it remains exceedingly important for those of us that still believe in annual, statewide standardized testing to articulate–again, and again, and again–why it matters. So if the problems above weren’t sufficient to sway you, here are the top five things we lose by giving up on annual testing:

1. A Shot at Fairer School Accountability. Proficiency rates on standardized tests, as NCLB showed, often revealed more about the makeup of a school’s student body than what the school was doing to improve their education. Until growth measures. Using annual tests, states can now isolate what schools add to students’ learning experiences, regardless of background or prior capability. Two schools that look the same in terms of proficiency can look remarkably different on growth—because one of the schools is accelerating students’ learning trajectories upward, and the other is not. While far from the reality in far too many states, accountability systems at least have the potential now to accurately and fairlycapture these differences—ensuring that schools with challenging student populations are not automatically penalized for the students they serve. Without annual tests, however, that potential is lost, and states’ accountability metrics will likely revert to being almost entirely correlated with student characteristics, rather than real differences in school performance—a huge setback for low-income schools and communities.

2. High Standards for Students that Matter. Annual assessments don’t exist in isolation. They weren’t brought here by magic. They exist because states believed it was important to lay out clear expectations for what all kids needed to know at each grade level in order to succeed in future educational settings, the workforce, or as a contributing member of civic life. The case for state standards stretches back decades–to A Nation at Risk. But standards are just words on a page. Assessing whether students meet them makes those words matter. Regardless of whether they’re Common Core standards, or not, pretending that it isn’t important to regularly assess and report on students’ progress, in a consistent and comparable way, against whatever standards states have undermines their legitimacy and belittles the notion that it’s important for students to master essential knowledge and competencies. States have put tremendous resources–and increasingly, political capital–into writing college- and career-ready standards. If those expectations are paramount for all students, statewide, then so is frequently and consistently assessing whether students are meeting them, statewide.

3. Continuous Improvement and Innovation. Annual, statewide assessments are powerful, because they provide actionable information to guide improvement, at multiple levels. They are much more than a “label” for schools or kids. Rich annual testing data, including growth over time, demonstrate where state and district resources should be targeted and what those resources should do. A school with low proficiency rates for English language learners needs a different kind of support and strategy than a school with low growth rates in 7th and 8th grade math for all students. Annual tests have also helped validate novel and innovative reforms: charter schools and portfolio districts; evaluation systems that provide real feedback on educator practice; early warning systems that help prevent student disengagement and dropout; and more. All of these initiatives–and future policy innovations, like personalized learning and competency-based education–depend on regular assessment and evaluation to validate them.

4. Smarter School Choice. Without an annual testing requirement, the kinds of tests offered at the local level will likely skyrocket. But the only comparable measure across states (or even districts) could be 4th grade, 8th grade, and 11th grade tests. And this could make responsible school choice policies and expansion more challenging. How will parents make informed school choices, within or across districts, if they are presented with a different data profile at every turn? What use is an A-F grading system if the components that make up those grades are different from school to school? Further, choice advocates–and skeptics–should also value annual testing. How can high-quality charter schools show they are as effective, or better, than traditional public schools if they don’t have comparable data to prove it? And what about charter schools that struggle? Because many charters start with only one or two grades, it’s possible that some could be failing to educate students adequately for years before those deficiencies show up in testing data. Annual testing data isn’t just essential for fairer school accountability for traditional public schools, but also for effective choice policies and authorizing mechanisms across sectors.

5. Fiscally Responsible Governance and Safeguards. The federal government spends over $14 billion on Title I grants in NCLB annually–not to mention over $11 billion on special education. And those are just the two largest federal K-12 programs. Education is also one of the biggest budget line items at the state level, and a significant cost for local jurisdictions, too. It is simply illogical to invest this kind of money at a system-levelwithout assessing the returns in terms of student outcomes–and assessing them in a way that is comparable across districts, across states, and across time to the greatest extent possible. Running a state education system becomes a guessing game if every district submits different evidence of their results. Worse, it may not only enable inequity to fester, but also allow it to grow if inequity can no longer be measured accurately, or can only be measured at certain points in time where data allow it.

The move away from annual testing isn’t just a “bad idea whose time has come,” as my colleague Andy Smarick wrote. It’s a terrible, horrible, no good, very bad idea. Here’s hoping that policymakers listen to reason.

*****

III. Let’s Talk About Tests: Four Questions to Ask

If you follow education news, politics, and social media, it’s clear that testing is having a moment. I was surprised it wasn’t listed alongside Taylor Swift as a nominee for Timemagazine’s 2014 Person of the Year. Everyone–policymakers, unions, state leaders, local administrators, teachers, parents, you name it–seems to agree that the amount of testing and its role in America’s schools and classrooms merit reconsideration. But the momentum of this “over-testing” meme has overshadowed the fact that testing policy is complicated. And when the field talks about “over-testing,” it’s often not talking about the same kinds of tests or the same set of issues.

To help clarify and elevate our over-testing conversation (because it’s here to stay), here are four questions to ask, with considerations to weigh, when deciding whether testing is indeed out of control–and evaluating the possible options to change it.

1. Standardized? In AP Biology, I took a test every week–but only one of them was “standardized” in the way most use the term: the AP test at the end of the year. Debates about testing, however, tend to ignore the common, teacher-developed ones used to assess students’ grasp of content throughout the year (which, let’s be honest, also require a significant amount of time to prepare and take).

In other words, they are usually debates about over-standardized testing–the large-scale assessments, often multiple choice, that are given to thousands of students, with consistent scoring and comparable results. Examples are numerous and varied: Advanced Placement tests, the New York Regents exams, NAEP, Smarter Balanced, Texas STAAR testing, NWEA’s Measures of Academic Progress (MAP), the ACT, Teaching Strategies GOLD, and so on. But with such differences in scope, development, design, and purpose, debating the value of “standardized” testing alone fails to get to the heart, or complexity, of the issue.

2. How much? A growing point of contention is whether standardized tests should be given annually or in grade-spans (once in elementary school, once in middle school, and once in high school). Current federal law is actually a mix of both. Since NCLB’s passage, states have been required to have math and reading standardized tests annually in grades 3-8, but only once in high school. Similarly, state science assessments must be given once in each grade-span (grades 3-5, 6-9, and 10-12). Prior to NCLB, the Improving America’s Schools Act of 1994 required states to have grade-span testing in reading and math for the first time–and many, especially teachers’ unions, are hoping Congress reverts back to the earlier testing mandate, with several possible bills pending or expected in 2015.

But no matter the statutory language, in practice, the question of how much standardized testing is too much can become even more complex. For example, grade-span testing could be staggered so that students take reading tests in grades 3, 6, and 9, math tests in grades 4, 7, and 10, and science tests in grades 5, 8, and 11. In other words, there are still standardized tests every year. Or, a system of state tests in three grades could be given on top of a system of district tests given in three different grades. Again, the result looks a lot like “annual testing,” just not the kind of statewide, comparable annual testing system NCLB requires today.

To summarize, for those caught up in the over-testing angst, the problem may not be the fact that standardized testing occurs each year, but rather the kinds and number of standardized tests that are administered to kids.

3. Administered to whom? One way some have suggested to combat over-testing, is to userandomized sampling techniques, similar to the NAEP. NCLB required all states to participate in NAEP testing every two years, but every student does not participate, and NAEP tests are not administered in every school. Instead, schools and students are selected randomly to participate so that enough students take the NAEP test for it to produce usable data for the all students group and for particular subgroups, like Black, Hispanic, and low-income kids. Putting aside the fact that NCLB requires assessments to be given to all students and even dings schools in its accountability requirements if they have low participation rates (after all, the law could change), sampling would make it more difficult to produce usable achievement data for individual districts and schools, especially in small schools or rural areas. There’s a reason NAEP can only produce statewide results (plus results for a select group of large districts like Chicago and Atlanta)–the more-detailed the information that is desired, the more students that must be included in the sample.

Another drawback? Tests may need to be used for more than trend analysis–we may actually want accurate information on individual students, teachers, or schools. States may want to use assessments to validate a particular policy approach in a network of schools, identify students at-risk for dropping out of high school to intervene before it’s too late, or determine which schools need additional supports, resources, and technical assistance. Not to mention, some kids wouldn’t be tested at all. What kind of information would these students and their families receive about their educational progress? What kind of message does that send about the importance of mastering state standards? About the importance of equal opportunities and fairness? Sampling is a valid technique, yes, but it may not be the right technique for what policymakers and educators need these tests to accomplish.

4. Administered by whom? And finally, in the ongoing debate about testing, there’s still the question of which part of our fragmented education system administers these tests. NCLB requires states to develop systems for assessing the achievement of all students against the state’s standards, with only limited options for alternate assessments for students with significant disabilities. In simpler terms: students must take the same test, in every district, statewide. And as recent studies have found, many districts then offer their own tests as well, to the point that local tests outnumber the state ones required by NCLB. Increasingly, some districts would like new flexibility to reclaim testing as their own. They propose to administer local assessments in some grades, and the statewide assessment in others (federal policymakers would need to enable this change, as it conflicts with current law).

This might reduce the number of standardized tests students take, especially if they attend school in a district that has adopted numerous local tests on top of what the state requires. But it would certainly increase the variety of tests administered within a state, and in turn, make it much more challenging to compare results across districts or states. Worse, it makes it extremely difficult, if not impossible, to measure student growth if local assessments are not aligned from one year to next, or to each other. Losing comparable data would be a blow not just for accountability, evaluation, and research, but also for communicating about the state of our education system and making smart policy decisions.

As the testing debate continues in the new year, it’s time for the education field to get a little more specific about the testing problem they’re trying to solve–and the trade-offs the proposed solutions may create.

- Anne Hyslop

Anne Hyslop is a Senior Policy Analyst with Bellwether Education Partners. This blog entry first appeared on Bellwether’s Ahead of the Heard blog, where it appeared as three separate blog entries, which can all be found here.

School Leadership 2.0

Testing, Testing 1-2-3 from Education Next by Anne Hyslop

You need to be a member of School Leadership 2.0 to add comments!

JOIN SL 2.0

FOLLOW SL 2.0