Imagine this. You take your 16-year-old son to the DMV on his birthday, so he can get his driver's license. He takes the 20-question multiple-guess test, getting only two wrong. Congratulations, says the clerk to your son. You're proficient. Here's your license.
Wait a minute, you say. Doesn't he have to do a road test, twenty minutes in a car with a skilled evaluator subjecting him to a variety of actual driving challenges? No, says the clerk.
That test was too expensive, and too subjective. We had to train and pay human evaluators, and some were tougher on new drivers than others. Plus, it wasn't valid. Teenaged drivers who passed the road test continued to get in accidents. With the standardized test, we know exactly how your son compares to all other 16-year olds on the same 20 items of driver knowledge. We can measure the effectiveness of driver training programs, too. Now that we have Interstate Core Driving Knowledge standards, we can even compare teens in Michigan to those in Ohio. Our teens do 4% better than the Buckeye kids on the test!
I cooked up this scenario while brooding about this reflection by Dana Goldstein on the push to evaluate all teachers using standardized, quantifiable measures of student learning:
Secretary of Education Arne Duncan acknowledged that the administration's success, via Race to the Top, in getting states to agree to evaluate teachers based on student achievement data has outpaced the ability of states to create the student assessments that make such teacher evaluation possible.
Here's the problem: Currently, fewer than half of all public school teachers teach a tested subject in a tested grade. As states embrace value-added teacher evaluation, however, schools will need to collect data on the student achievement outcomes of all teachers. That means either issuing pencil and paper tests to students in every grade and subject area, or devising more complex (and potentially expensive to administer) assessments, such as portfolio systems that correspond to some kind of numerical scale.
The challenge is that although most people agree that paper tests in kindergarten gym class are absurd, many districts will be sorely tempted to take the easy way out—testing—when told they must now collect "data" on every single teacher. Tests are cheap to administer and score and have the benefit of being "objective;" unless there is outright cheating, two different evaluators will grade a test much the same way. In addition, there's the thorny question of whether it's fair to evaluate and pay a math teacher according to "objective" test-score data, while relying on a highly subjective portfolio system to evaluate and pay an art teacher.
My response to Goldstein:
Tests aren't cheap. And they're not objective, either. Many standardized tests aren't even aligned with the knowledge and skills in the curriculum (including subjects that have traditionally been tested). Further, tests would be the worst possible way to evaluate teachers in many ...
As a 30-year music teacher, I shudder to think of how good curriculum and instruction in the arts and physical education would be twisted to accommodate pointless testing that reveals nothing of value.
I believe in rigorous evaluation of teaching and learning. There are critical, essential skills and knowledge in the arts and all other untested disciplines and levels. In early elementary music, for example, the most important musical concepts and skills can easily be observed: pulse and rhythm, movement, tone and pitch, listening and repetition, creative use of musical elements, transmission of culture through lyrics and dance, etc.
Any music teacher worth her salt could and should be able to identify these elements as learning goals, measure students' growth through observation and record-keeping, and demonstrate what they have learned, to a trained evaluator. If that evaluator was a disciplinary peer, such evaluations wouldn't have to cost a nickel.
Tests are not the "easy way out"—and somebody will be making money to create them.
The purpose of any test must be to measure what's important.
Who gets to decide what's worth testing? And with whom would you feel most comfortable sharing the road?