JSAI2020

Presentation information

General Session

General Session » J-9 Natural language processing, information retrieval

[2D1-GS-9] Natural language processing, information retrieval: Support technology

Wed. Jun 10, 2020 9:00 AM - 10:40 AM Room D (jsai2020online-4)

座長:貞光九月(フューチャー株式会社)

10:20 AM - 10:40 AM

[2D1-GS-9-05] News Image Caption Generation

〇Zhishen Yang1, Naoaki Okazaki1 (1. Tokyo Institute of Technology)

Keywords:vision and language, image captioning, multimodality

Vision and language as a vibrant multimodal machine learning research field aim to create models that serve comprehension of information across vision and language modalities. In this work, we utilized the multimodal Transformer model with joint text-vision representation to approach one of the vision and language tasks: news image caption generation. The multimodal Transformer model leverages context from the article with consideration of the scene in the associated image to generate caption. The experimental result demonstrated the multimodal Transformer significantly improved the quality of generated news image caption.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password