JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[1E4-GS-6] Language media processing

Tue. Jun 6, 2023 3:00 PM - 4:40 PM Room E (A2)

座長:長谷川 拓(NTT) [現地]

4:20 PM - 4:40 PM

[1E4-GS-6-05] Logical Inference with Phrasal Knowledge Injection using Vision-and-Language Model

〇Akiyoshi Tomihari1, Hitomi Yanaka1 (1. Univ. of Tokyo)

Keywords:Recognizing Textual Entailment, Vision-and-Language, Natural Language Processing, Inference System

Recognizing Textual Entailment (RTE) is an important task, which is applied to question-answering and machine translation. One of the main challenges in logic-based approaches to this task is the lack of background knowledge. This study proposes a logical inference system with phrasal knowledge by comparing their visual representations based on the intuition that visual representations facilitate humans to judge entailment relations. First, we obtain candidate phrase pairs for phrasal knowledge from the process of logical inference. Second, using a Vision-and-Language model, the visual representations of these phrases are acquired in the form of images or embedding vectors. Finally, the obtained visual representations are compared to determine whether to inject the knowledge corresponding to the candidate or not. Besides simple similarity between phrases, asymmetric relations are considered in comparing visual representations. Our logical inference system improved the accuracy on the SICK dataset compared with a previous logical inference system, SPSA.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password