[AP2-E2-4-02] OuBioBERT: An Enhanced Pre-Trained Language Model for Biomedical Text With/Without Whole Word Masking
Deep Learning, Natural Language Processing, Data Mining
With the development of contextual embeddings introduced by transformer-based language models such as Bidirectional Encoder Representations from Transformers (BERT), the performance of information extraction from free text has improved significantly. Sometime after the release of pre-trained BERT models, the authors also released whole word masking (WWM) ones, which might have better performance. Meanwhile, many studies, such as BioBERT and clinicalBERT, have shown that pre-training BERT on a large biomedical text corpus results in satisfactory performance in biomedical natural language processing (BioNLP), but there are not yet any WWM models for BioNLP. With this in mind, we pre-trained a biomedical WWM model and evaluated the performance in terms of the biomedical language understanding evaluation (BLUE) benchmark.
We have already released an enhanced biomedical BERT model, ouBioBERT, so our new model was initialized from ouBioBERT and pre-trained on the same corpus as ouBioBERT via our method with WWM. Then, we evaluated them using the BLUE benchmark, which consists of five BioNLP tasks with ten datasets.
The total score of ouBioBERT with WWM was 0.1 points above that of the original ouBioBERT. This result suggests that WWM may also be effective in the biomedical domain, though the result was not statistically significant (p=0.47).
We have already released an enhanced biomedical BERT model, ouBioBERT, so our new model was initialized from ouBioBERT and pre-trained on the same corpus as ouBioBERT via our method with WWM. Then, we evaluated them using the BLUE benchmark, which consists of five BioNLP tasks with ten datasets.
The total score of ouBioBERT with WWM was 0.1 points above that of the original ouBioBERT. This result suggests that WWM may also be effective in the biomedical domain, though the result was not statistically significant (p=0.47).