Cross-modal Description Generation for Future Events in Daily Tasks

Motonari Kambara

9:20 AM - 9:40 AM

[2O1-GS-7-02] Cross-modal Description Generation for Future Events in Daily Tasks

〇Motonari Kambara¹, Komei Sugiura¹ (1. Keio University)

Keywords:Video captioning, Future captioning, Cross-modal, Relational Self-Attention

In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning task. The RFCM has the Relational Self-Attention Encoder to extract the relationships between events more effectively than the conventional self-attention in transformers. We conducted comparison experiments, and the results show the RFCM outperforms a baseline method on two datasets.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2O1-GS-7] Vision, speech media processing: generation

[2O1-GS-7-02] Cross-modal Description Generation for Future Events in Daily Tasks

Password