[3Rin4-08] Classification of Issue discussions in Open Source Software Projects using BERT or Automated ML
Keywords:BERT, Automated ML, OSS, Natural Language Processing
Abstract: (1) Purpose: Discovering and retrieving relevant information from lengthy documents is a challenging task, such as product defect reports, chat-histories of a call center, minutes of the meeting. Thus, constructing a technic identifying information types of each sentences in a document is important. We challenged revealing which type of Feature Engineering is effective for this task, or confirmed whether the BERT model is effective. We used Open Source Software Issue discussion as a corpus in this study, such as TensorFlow and scikit-learn. (2) Results: As a result from trained models using AutoML and calculated the global importance using SHAP, the length of sentences, the position in the document and the time between comments are important. A limited fine tuning of BERT, which means training only the parameters of the final layer, was no significant difference in the performance from ordinal logistic regression.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.