Could a computer program grade a student essay just as a teacher would? An Australian start-up has joined forces with a slew of independent programmers and education companies to answer the question.
The start-up, called Kaggle, uses crowd-sourcing to solve complex science problems. It has built a network of programmers and scientists who compete to answer puzzles that involve analysing huge amounts of data.
For its latest project, Kaggle collected thousands of student essays from old standardised exams. It has given those papers to some of the largest US testing companies and to the independent programmers in its network, who are now competing to design an algorithm that can read the essays and award them the same grade that the original human readers did. The competition closes to the public later this month. The prize for the independent programmer who gets the best results is $60,000.
"These engines can score grammar, spelling, syntax, and they can score some elements of style," says Jaison Morgan, a consultant who helped design the competition.
How well a computer can judge writing is of serious interest to education companies that develop and score standardised tests. Demand for exams is growing globally as measuring and quantifying performance becomes more important. In the US, for example, federal policies require that schools demonstrate their students' progress to maintain some of their funding.
The US's testing and educational consulting industry had $15.4bn in revenue last year, with annual growth of more than 5 per cent expected over the next five years, according to IBISWorld, a market research company.
Grading tests, particularly written responses, requires labour that publicly funded school systems have to pay for out of tightening budgets. That pinch creates a potential opening for a cheaper but reliable way to grade essay exams. "We think that machine scoring may be an alternative," says Mr Morgan.
Pearson, owner of the Financial Times and one of the education industry's largest players, is participating in Kaggle's project.
Companies such as US-based ETS, which develops and marks exams, see other opportunities to sell the technology directly to schools for use in the classroom. One program they developed can give students preliminary feedback on flaws in their writing, so teachers can figure out faster where students need help.
Some law firms, faced with reviewing millions of documents in advance of big trials, use software that skims through the files and picks out those most relevant. Siri, a voice-controlled application for Apple's latest iPhone, relies on similar technological principles to understand what people say to it.
Automated scoring also supplements human graders on the GRE exam, part of most US graduate school applications, and the TOEFL exam for English competency, both developed by ETS.
While the algorithms have some gaps, such as the ability to tell whether the content of the essays is factually correct, in some respects the computers are more consistent than human graders. People can get tired and miss details, says David Williamson, a research scientist at ETS, and have a bias towards giving higher scores to longer essays.
Barbara Chow, director of education programmes at the Hewlett Foundation, a non-profit organisation that is sponsoring Kaggle's competition, says the use of the technology in education is still embryonic. "Part of it is the question of whether people can trust these essay-scoring machines, and that's exactlyâ€‰.â€‰.â€‰.â€‰the purpose of our competition in the first place, to see whether or not teachers and states can trust these algorithms," she says.
Although the competition is still open, a French actuary and Polish programmer behind one of the leading independent algorithms agreed to see what their software had to say about this article.
The outcome: Decent. Matched against a pool of high-school students' essays about computers, it won 11 out of 12 possible points, making it the third best of the 1,000 essays in the data sample.
The results of Kaggle's project are not yet public, but in one test of a leading algorithm's accuracy, one of the developers agreed to see what his program had to say about this composition.