James Popham on Using Test Scores to Evaluate Teachers

In this timely American School Board Journal article, UCLA testing guru James Popham addresses the much-discussed idea of using student test scores to evaluate individual teachers. “The quality of students’ learning should, in my view, be a key determinant of judgments of a teacher’s skill,” he says. “Indeed, it should be far and away the most significant criterion for appraising teachers.” However, he continues, “absolutely no evidence exists that the tests to be used in such evaluations are capable of differentiating between effectively and ineffectively taught students… If we allow the wrong tests to be used when judging our teachers, we are certain to make many mistakes about which teachers are doing well and which teachers aren’t. The most significant consequence of those mistakes is that – over time – our students will surely be less well taught.” 

What we don’t know, says Popham, is whether the tests in question are “instructionally sensitive” – that is, do their items accurately measure what teachers are teaching? “Here’s the astonishing reality,” says Popham. “Tests now being touted as suitable for judging teacher quality are accompanied by not one lick of evidence that they are instructionally sensitive. Should we evaluate teachers on the basis of tests whose suitability for this evaluative mission has not been verified? The answer is obvious.”

What should we watch out for in tests? Popham says there are at least six ways an individual test item can be instructionally insensitive:

  • Poor alignment – If an item doesn’t accurately measure students’ mastery of the specified curriculum objective, then no matter how well the teacher teaches that objective, the item won’t be a fair measure of the teacher’s effectiveness.
  • Too easy – If even badly-taught students can correctly answer an item, then it won’t accurately measure the difference between effective and ineffective teachers.
  • Too difficult – Conversely, if a test item is so tricky that even well-taught students get it wrong, the item won’t discriminate between effective and ineffective teachers.
  • Confusing – If items have mangled syntax or ambiguous answer choices, even well-taught students will do poorly.
  • SES issues – Test items that give an unfair advantage to children from higher-income families measure those advantages, not teachers’ effectiveness.
  • Aptitude issues – Items that measure aptitude rather than what students learn in school are also poor measures of teacher effectiveness.

Popham says that if a test has even one of these flaws, it is instructionally insensitive and therefore poorly suited to evaluating teachers.

So are current state tests up to snuff? Popham says “we simply have no evidence, one way or the other, confirming the ability of today’s tests to accurately measure teachers’ instructional quality. Such evidence is desperately needed.” 

What would it take to identify instructionally insensitive test items and fix them? First, individual items would have to be checked by assessment experts for alignment and validity. Second, experts would have to see if item-by-item student results lined up with teachers’ previous track records in bringing about (or not bringing about) higher student achievement over time. If effectively taught students answered an item correctly and ineffectively taught students answered it incorrectly, it would indicate that the item was instructionally sensitive. 

Wouldn’t this process be enormously time-consuming and expensive? Not so, says Popham: “Because the data needed for such analyses are already available in most states’ test-data repositories, and because the identity of teachers does not need to be revealed, this empirical work can be carried out both inexpensively and unobtrusively.”

So where does this leave us? “The mismeasurement of our teachers constitutes an enormous social blunder – chiefly because of the adverse impact this mistake will have on the students our schools serve,” concludes Popham. “Nonetheless, test-based teacher evaluation is currently careening toward us with precious few impediments in its way. So, if teachers are going to be judged on the basis of their students’ test scores, let’s make certain that the tests being used are appropriate.” 

“(Mis)Measuring Teachers” by James Popham in American School Board Journal, September 2010 (Vol. 197, #9, p. 36-38), no e-link available

From the Marshall Memo #348

Views: 74

Reply to This

JOIN SL 2.0

SUBSCRIBE TO

SCHOOL LEADERSHIP 2.0

Feedspot named School Leadership 2.0 one of the "Top 25 Educational Leadership Blogs"

"School Leadership 2.0 is the premier virtual learning community for school leaders from around the globe."

---------------------------

 Our community is a subscription-based paid service ($19.95/year or only $1.99 per month for a trial membership)  that will provide school leaders with outstanding resources. Learn more about membership to this service by clicking one of our links below.

 

Click HERE to subscribe as an individual.

 

Click HERE to learn about group membership (i.e., association, leadership teams)

__________________

CREATE AN EMPLOYER PROFILE AND GET JOB ALERTS AT 

SCHOOLLEADERSHIPJOBS.COM

New Partnership

image0.jpeg

Mentors.net - a Professional Development Resource

Mentors.net was founded in 1995 as a professional development resource for school administrators leading new teacher induction programs. It soon evolved into a destination where both new and student teachers could reflect on their teaching experiences. Now, nearly thirty years later, Mentors.net has taken on a new direction—serving as a platform for beginning teachers, preservice educators, and

other professionals to share their insights and experiences from the early years of teaching, with a focus on integrating artificial intelligence. We invite you to contribute by sharing your experiences in the form of a journal article, story, reflection, or timely tips, especially on how you incorporate AI into your teaching

practice. Submissions may range from a 500-word personal reflection to a 2,000-word article with formal citations.

© 2026   Created by William Brennan and Michael Keany   Powered by

Badges  |  Report an Issue  |  Terms of Service