Machine learning and AI may be deployed on such grand tasks as finding exoplanets and creating photorealistic people, but the same techniques also have some surprising applications in academia: DeepMind has created an AI system that helps scholars understand and recreate fragmentary ancient Greek texts on broken stone tablets.
These clay, stone or metal tablets, inscribed as much as 2,700 years ago, are invaluable primary sources for history, literature and anthropology. They’re covered in letters, naturally, but often the millennia have not been kind and there are not just cracks and chips but entire missing pieces that may comprise many symbols.
Such gaps, or lacunae, are sometimes easy to complete: If I wrote “the sp_der caught the fl_,” anyone can tell you that it’s actually “the spider caught the fly.” But what if it were missing many more letters, and in a dead language, to boot? Not so easy to fill in the gaps.
Doing so is a science (and art) called epigraphy, and it involves both intuitive understanding of these texts and others to add context; one can make an educated guess at what was once written based on what has survived elsewhere. But it’s painstaking and difficult work — which is why we give it to grad students, the poor things.
Coming to their rescue is a new system created by DeepMind researchers that they call Pythia, after the oracle at Delphi who translated the divine word of Apollo for the benefit of mortals.
The team first created a “nontrivial” pipeline to convert the world’s largest digital collection of ancient Greek inscriptions into text that a machine learning system could understand. From there it was just a matter of creating an algorithm that accurately guesses sequences of letters — just like you did for the spider and the fly.
PhD students and Pythia were both given ground-truth texts with artificially excised portions. The students got the text right about 57% of the time — which isn’t bad, as restoration of texts is a long and iterative process. Pythia got it right… well, 30% of the time.
But! The correct answer was in its top 20 answers 73% of the time. Admittedly that might not sound so impressive, but you try it and see if you can get it in 20.
The truth is the system isn’t good enough to do this work on its own, but it doesn’t need to. It’s based on the efforts of humans (how else could it be trained on what’s in those gaps?) and it will augment them, not replace them.
Pythia’s suggestions may not be perfectly right on the first try very often, but it could easily help someone struggling with a tricky lacuna by giving them some options to work from. Taking a bit of the cognitive load off these folks may lead to increases in speed and accuracy in taking on remaining unrestored texts.
The paper describing Pythia is available to read here, and some of the software they developed to create it is in this GitHub repository.