Man vs. Computer: Who Wins the Essay-Scoring Challenge?

Would you rather have an actual person score your carefully crafted essay, or an automated software program designed for that purpose?

I'd still take the flawed human being any day—assuming, of course, the proper expertise and that he or she is operating on a good night's sleep—but a new study suggests there is little, if any, difference in the reliability and accuracy of the computer approach.

And this may be good news for those who believe essays are an essential component of state testing systems, since the cost-savings may well encourage more states to embrace the use of such test items to balance out multiple-choice questions.

"The demonstration showed conclusively that automated essay-scoring systems are fast, accurate, and cost-effective," said Tom Vander Ark, the chief executive officer of Open Education Solutions, and a co-director of the study, in a press release. (Vander Ark is also a former top education official at the Bill & Melinda Gates Foundation.)

The study is described in the news release as the "first comprehensive, multivendor trial to test" claims by companies that provide automated essay-scoring software. It challenged nine companies to compare their capabilities. More than 16,000 essays were released from six participating states, with each set of essays varying in length, type, and grading protocols. The essays had already been hand-scored, and the challenge was for the companies to approximate established scores through their software.

The study was funded by the William and Flora Hewlett Foundation, which also provides financial support for Education Week coverage.

It grew out of a contest Hewlett is sponsoring called the Automated Student Assessment Prize, to evaluate the current state of automated testing and to encourage further developments in the field.

The study comes as two state testing consortia are working to develop new assessment systems pegged to the Common Core State Standards in reading and mathematics. In fact, the two consortia are supporting the Hewlett effort, and three PARCC states and three SMARTER Balanced states supplied student essays for the current study.

"The results demonstrated that overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre," says the study, co-authored by Mark Shermis, the dean of the University of Akron's college of education, and Ben Hammer of Kaggle, a private firm that provides a platform for predictive modeling and analytics competitions.

Barbara Chow, the education program director at Hewlett, said in the press release that she believes the results will encourage states to include a greater dose of writing in their state assessments.

And she believes this is good for education.

"The more we can use essays to assess what students have learned," she said, "the greater likelihood they'll master important academic content, critical thinking, and effective communication."

School Leadership 2.0

Man vs. Computer: Who Wins the Essay-Scoring Challenge?

Man vs. Computer: Who Wins the Essay-Scoring Challenge?

Replies to This Discussion

JOIN SL 2.0

New Partnership