‘Greek and Latin corpora’

Convenors: Olivier Delouis, Christophe Gaillac, Tristan Alonge

The study of Greek and Latin languages is ever more concerned with corpora analysis. Collections of texts have developed dramatically over the last twenty years. Almost all Greek ancient and medieval literature from Homer to the Fall of Constantinople in 1453 is today digitized through the “Thesaurus Linguae Graecae” or TLG (by the University of California, founded 1972), while on the Latin side the “Brepolis’ Library of Latin Texts” offers an immense array of texts from the beginnings of Latin literature until the present day (by Brepols Publisher, founded 1991). Inventories of these classic corpora, including growing collections in open access, are regularly made and enable studies that are generally limited to each individual case (see for instance Digital Classical Philology, Ancient Greek and Latin in the Digital Revolution, dir. Monica Berti, 2019).

Now, there are many methods and tools applicable to the analysis of modern languages. Still, the branch of artificial intelligence that helps computers to understand human languages, i.e. Natural Language Processing (NLP), remains underdeveloped for classical languages. Many concepts used in modern corpora analysis such as deep learning-based approaches, convolutional and recurrent neural networks, contextual language models or recently bidirectional encoder representations from transformers (BERT) are still far away from being used in classical humanities.

In this seminar, we aim to present the work of scholars who engage in cross-disciplinary collaboration between the study of classical literature and NLP.

Speakers:

Thibault Clérice (École nationale des chartes, PSL, Paris) – “Detecting sexual isotopies in Latin corpora: setting up an experiment and first results”
Marianne Reboul (IHRIMUMR 5318 & ENS Lyon): “Homer and Machine Learning: translations alignment on Iliad and Odyssey”
Thea Sommerschield (Marie Curie Fellow, Ca’Foscari, University of Venice) – “Greek epigraphic data for Machine Learning applications: the Ithaca project”