6:00 PM - 6:20 PM
[1P5-OS-16b-04] A Lage-scale Labeled English Text Datasets for Machine Learning: Case of Issue-based Information System
Keywords:annotation, text dataset, machine learning, deep learning, natural language processing
Textual data has emerged as one of the fastest-growing data types on the internet. This development has led to significant advancements in the field of Natural Language Processing (NLP) in recent years, primarily driven by the utilization of Deep Learning (DL) and Machine Learning (ML) techniques. These methods are known to require copious amounts of labeled text data in a specific format and structure for model training purposes using some sort of dialogue mapping. For instance, node and link extractor models have been trained in D-Agree using text-based training data while adopting Issue-based Information System (IBIS) notation. However, training such models in English has been challenging due to the arduousness of preparing labeled IBIS English datasets. In this study, we present a process for annotating and releasing large quantities of training data for machine learning based on IBIS, providing researchers with a free environment to train their opinion extractor models in English.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.