[3Xin4-65] Verifying the Effectiveness of Sentence Embedding Learning in Japanese based on Contrastive Learning with Non-linguistic Data
Keywords:NLP, Sentence Embedding, Contrastive Learning
Sentence embedding learned from text is widely used for semantic textual similarity, automatic evaluation of text generation, and so on. As one of the sentence embedding learning methods, SimCSE based on contrastive learning is proposed and achieves high accuracy in the semantic textual similarity task. VisualCSE and AudioCSE, which are derivatives of SimCSE, are methods that add training using image and audio data in addition to text-based training and have been shown to further improve accuracy in English. However, these methods using non-linguistic data have not been validated in Japanese. This study examines the effectiveness of VisualCSE in Japanese. As a result, VisualCSE in Japanese did not show the significant improvement in accuracy seen in the English experiment. Also, we analyze the impact of sentence embedding learning by using noise data instead of image data.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.