
Presentation information

Organized Session

Organized Session » OS-18

[1N4-OS-18] OS-18

Tue. May 28, 2024 3:00 PM - 4:40 PM Room N (Room 54)

オーガナイザ:大向 一輝(東京大学)、嘉村 哲郎(東京藝術大学)、亀田 尭宙(国立歴史民俗博物館)、中村 覚(東京大学)

3:00 PM - 3:20 PM

[1N4-OS-18-01] Building a Machine Translation Dataset to Support Coptic Language Education and Revitalization Movement

〇So Miyagawa1 (1. National Institute for Japanese Language and Linguistics)

Keywords:Machine Translation, Coptic, Low-Resource Language, Digital Humanities, Language Revitalization

The objective of this study was to create a comprehensive machine translation dataset for the Bohairic dialect of Coptic, aiming to support both the liturgical use within the Coptic Orthodox Church and the broader language revitalization movement. As a result, by digitizing a vast array of Bohairic texts, we assembled a dataset containing over 400,000 tokens in Bohairic Coptic distributed across 27,900 Bohairic Coptic-English translation pairs. This dataset was specifically designed to train models based on the OPUS-MT framework, which are integrated into the Coptic Translator platform to facilitate accurate and accessible translations. This project not only demonstrates the application of digital humanities in linguistic preservation but also provides a valuable resource for computational linguistics, contributing to the ongoing efforts to revitalize and maintain the Bohairic dialect of the Coptic language.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.
