JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[1E4-GS-6] Language media processing

Tue. Jun 6, 2023 3:00 PM - 4:40 PM Room E (A2)

座長:長谷川 拓(NTT) [現地]

3:00 PM - 3:20 PM

[1E4-GS-6-01] Multimodal Inference for Numerals with Model Checking and Knowledge Injection

〇Nobuyuki Iokawa1, Hitomi Yanaka1 (1. Univ. of Tokyo)

Keywords:Natural Language Inference, Multimodal Inference, Numerical Understanding, Model Checking

Inference between different modalities has been actively studied in recent years. We focus on Visual-textual Entailment (VTE), one of the most critical tasks for multimodal inference. A variety of deep learning-based approaches have been proposed for the VTE task, but they have difficulty in accurately handling numerals. In contrast, approaches based on logical inference can successfully deal with numerals. However, since the previous logic-based approaches use automated theorem provers, their computational cost significantly increases for problems involving many entities. In this paper, we propose a logic-based VTE system with model checking and knowledge injection. We create a dataset for the VTE task containing numerals and negation to evaluate the extent to which VTE systems correctly understand those phenomena. Using this dataset, we show that our system solves the VTE task with numerals and negation more robustly than the previous approaches.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password