3:00 PM - 3:20 PM
[1N4-OS-18-01] Building a Machine Translation Dataset to Support Coptic Language Education and Revitalization Movement
Keywords:Machine Translation, Coptic, Low-Resource Language, Digital Humanities, Language Revitalization
The objective of this study was to create a comprehensive machine translation dataset for the Bohairic dialect of Coptic, aiming to support both the liturgical use within the Coptic Orthodox Church and the broader language revitalization movement. As a result, by digitizing a vast array of Bohairic texts, we assembled a dataset containing over 400,000 tokens in Bohairic Coptic distributed across 27,900 Bohairic Coptic-English translation pairs. This dataset was specifically designed to train models based on the OPUS-MT framework, which are integrated into the Coptic Translator platform to facilitate accurate and accessible translations. This project not only demonstrates the application of digital humanities in linguistic preservation but also provides a valuable resource for computational linguistics, contributing to the ongoing efforts to revitalize and maintain the Bohairic dialect of the Coptic language.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.