9:00 AM - 10:30 AM
[MIS22-P04] Opendata produced by "Minna de Honkoku"
Keywords:Minna de Honkoku, Citizen science, Open data
"Minna de Honkoku" (https://honkoku.org/) is crowdsourced and online collaborative project to transcribe historical materials written in old Japanese. "Minna de Honkoku" was launched as an online citizen science project to transcribe earthquake-related historical materials from Earthquake Research Institute Library, the University of Tokyo. On July 2019, the system of "Minna de Honkoku" was upgraded to support IIIF, International Image Interoperability Framework. Broader range of manuscripts on digital archives adopting IIIF can be registered for transcription. The subjects of the project was extended to cover wide variety of historical materials as well as earthquake-related materials. AI-assisted transcription was also implemented. More than 3,500 documents are registered on the system. Total number of characters transcribed is about 28 million.
The transcribed text data is shared using Creative Commons licenses (CC BY-SA). The data is used for, for example, editing bibliographic information at libraries, museums and so on. The text is also used for publishing e-books that translates classical literature. An experiment of OCR text conversion of digitized materials of the National Diet Library, Japan (NDL) Lab utilized the transcribed text of "Minna de Honkoku." The OCR Training dataset is also published by NDL Lab.
The transcribed text data is shared using Creative Commons licenses (CC BY-SA). The data is used for, for example, editing bibliographic information at libraries, museums and so on. The text is also used for publishing e-books that translates classical literature. An experiment of OCR text conversion of digitized materials of the National Diet Library, Japan (NDL) Lab utilized the transcribed text of "Minna de Honkoku." The OCR Training dataset is also published by NDL Lab.