[4Xin1-51] Variable Description Prediction Method Based on Nomenclature and Formulas in Chemical Engineering Domain Papers
Keywords:Variable extraction, Information extraction, Mathematical language processing
Physical models are crucial for realizing digital twins in the manufacturing industry. However, building a highly accurate physical model requires enormous labor. We aim to develop AutoPMoB, an artificial intelligence that automatically builds physical models from literature databases. In this study, towards realizing AutoPMoB, we tackled the task of judging whether pairs of variable symbols and variable definitions are correct. We focused on nomenclature and mathematical formulas in chemical engineering-related papers and proposed a BERT-based judgment method. We fine-tuned 12 BERT models using various input formats and compared their performance. Our best model achieved an F1-score of 0.834 and an accuracy of 0.827 by using variable symbols in XML format and mathematical formulas in Unicode format. Additionally, the results indicated that mathematical formulas containing the target variable symbols improved the accuracy of variable definition extraction. Further performance improvement can be expected by utilizing input formats such as formula tree structures.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.