In December, I wrote a post questioning the No. 1 ranking of Shanghai in the most recent Program for International Student Assessment, known as PISA. Shanghai came out with the No. 1 international ranking in the 2012 administration of PISA in math, reading and science, while 15 year olds in the United States performed no better than average of 65 countries and education systems (like usual).
When the 2012 scores were released late last year, the Organization for Economic Cooperation and Development, which sponsors PISA, said that the schools that were used in the Shanghai sample represent the city’s 15-year-old population. Tom Loveless of the Brookings Institution and some China experts said that migrant children are still routinely excluded from schools in Shanghai, which is wealthier than the rest of China, but OECD stood by the results. Earlier this month, however, Andreas Schleicher, OECD deputy director of Education and Skills, told the British Commons Education Select Committee that PISA represented 73 percent of Shanghai’s 15 year olds, which is lower than the 79 percent he had said in December, according to TES Connect, a popular British education Web site. The U.S. sample, on the other hand, covered 89 percent of 15-year-old students.
In this post, Leslie Rutkowski and David Rutkowski, both faculty members at the Indiana University School of Education, look at just how overblown the Shanghai results might have been.
By Lesie Rutkowski and David Rutkowski
Sensational headlines grab attention. This is certainly true with news about education. And although policy is typically born out of analyses that run deeper than headlines, sometimes the really jaw-dropping stories gain traction and give rise to meaningful decisions. This was the case in 2000 when Germany was shocked by its lukewarm PISA results, setting in motion national education reforms that continue to reverberate through the country today. But sometimes sensational results only serve as a distraction, shocking and awing readers into supporting misguided or unnecessary reforms based on overblown results.
Such, we argue, is the case of the most recent Shanghai PISA results, which indicated that China’s economic juggernaut might also dominate international education, unseating Finland as the darling of the international rankings in math, science, and reading and vastly outpacing all other countries in the international academic horse-race. But based on a recent admission by Andreas Schleicher, the deputy director of Education and Skills at the Organization for Economic Co-operation and Development (OECD) and founding father of PISA, Shanghai’s stellar results are probably overblown. How overblown is impossible to say, but we can conjecture, with an example about the way Shanghai’s results might be skewed.
Recent discussions have brought to light that a large portion of children in Shanghai were not included in the population of all possible 15 year olds. Although quoted figures from Mr. Schleicher vary, somewhere between 21 and 27 percent of Shanghai children were not considered for participation in the most recent PISA cycle. We’ll split the difference and just call the figure a quarter. And while we can never know exactly who those children are and nor how they would have performed had they been included in the PISA test, let’s consider an illustrative example of the influence that 25 percent can have.
Consider the table below that presents the average 2012 PISA scores for Shanghai and Massachusetts in math, science and reading (standard errors for each score are in parentheses). We chose Massachusetts because it is usually among the very best U.S. states in educational performance. And as Shanghai is widely regarded as the best educational system in China, this makes the two systems at least somewhat comparable for this example. The differences between Shanghai and America’s education darling are staggering, particularly in math, where Shanghai’s average score is nearly 100 points higher than Massachusetts. On the PISA scale, 100 points is vast. One hundred points represents half the distance between the next highest math performer (Singapore at 573) and the very lowest math performer ( Peru at 368). And it puts Shanghai a figurative ocean apart from Massachusetts in math performance. But should Massachusetts educators and policy makers hang their heads in shame and book immediate passage to Shanghai to see what all the fuss is about? Maybe not. At least not yet.
Sample |
Math |
Reading |
Science |
Shanghai-China |
613 (3) |
570 (3) |
580 (3) |
Massachusetts |
514 (6) |
527 (6) |
527 (6) |
Tom Loveless, a senior fellow at the Brookings Institute, argues that the excluded Shanghai children are likely from poor, migrant backgrounds with low-quality or no educational opportunities. With this in mind, we reasoned that a more comparable example might exclude low-performing schools in Massachusetts. Consider the next example, which also presents the math, science and reading scores for Shanghai and Massachusetts. But this time, we’ve separated Massachusetts schools into two groups – the highest 75 and lowest 25 percent of schools. Although Shanghai continues to markedly outperform relatively higher-performing Massachusetts 15 year olds, the differences are less shocking. In fact, when we consider the all-important standard error associated with these scores, we can say with confidence that Shanghai’s actual scores could be as low as 606 while the highest potential score for the top three-fourths of Massachusetts schools is 546, cutting the original 100 point gap in half. The same story emerges in an adjusted comparison on reading and science, where Shanghai’s scores are plausibly as small as just 14 and 6 points higher than Massachusetts, respectively. Hardly the staggering sort of gaps that should cause wide-spread panic over the state of education in Massachusetts.
Sample |
Math |
Reading |
Science |
Shanghai-China |
613 (3) |
570 (3) |
580 (3) |
MA Top 75% of Schools |
532 (7) |
545 (7) |
547 (7) |
MA Bottom 25% of Schools |
453 (8) |
467 (8) |
461 (8) |
We recognize that Shanghai’s schools are still outperforming our best school system on PISA, even after filtering out the lowest-performing schools in Massachusetts. And perhaps Shanghai’s 15 year olds really are better PISA test-takers. We can never know what the actual differences would be between a truly representative sample of Shanghai 15 year olds and their Massachusetts peers. But it’s pretty clear, at least to us, that PISA results are sensitive to who is and who isn’t included in the results. And it’s critically important that open, honest conversations about just these sorts of issues are conveyed to policy makers in a digestible way.
Of course this isn’t any easy task. The complex machinery that underlies each of these test scores is not for the faint of heart. An army of experts in statistics, sampling, content, and policy all perform highly-sophisticated work to administer, analyze, and report PISA results. And each component of that process is a monumental task. But the consequences that are hitched to PISA results also have potential to enact monumental reform, which is likely not warranted based on a single statistic. And simply reporting results, in daring headline fashion, without caution, without caveat, is a dangerous practice. Although cautious reporting isn’t nearly as sensational as crying “Sputnik!” every time the next cycle of PISA results are reported, it is the responsible thing to do.
We know there is error associated with international educational assessment. In many cases we can and do account for this error, allowing for the results to serve as an important yardstick for policy makers. When used carefully, PISA and other international assessment results provide an important informational resource for researchers; however, when a glaring structural issue, like the omission of 25 percent of the population, is ignored or downplayed, international assessment critics have every right to be skeptical.
As Mr. Schleicher notes, when China does well, we suspect cheating or that we don’t know the whole truth. And when suspicions of less-than-forthcoming behavior are validated, the United States feels justified in ignoring international test results, preferring the “U.S. is doing fine” narrative. But is it? We believe that some parts of the U.S. education system are humming along just fine and that other parts of educational system are in dire need of help. We know this from our own national assessments. And, when comparisons are transparent, we know it from international tests like PISA. Unfortunately, when opaque technical problems take months to come to light, such as the reported sampling issues in Shanghai, Americans have a justifiable right to ignore a comparison with one of the most affluent cities in China.
You need to be a member of School Leadership 2.0 to add comments!
Join School Leadership 2.0