9:20 AM - 9:40 AM
[2O1-GS-7-02] Cross-modal Description Generation for Future Events in Daily Tasks
Keywords:Video captioning, Future captioning, Cross-modal, Relational Self-Attention
In this paper, our aim is to generate a caption about a future event. We propose the Relational Future Captioning Model (RFCM), a crossmodal language generation model for the future captioning task. The RFCM has the Relational Self-Attention Encoder to extract the relationships between events more effectively than the conventional self-attention in transformers. We conducted comparison experiments, and the results show the RFCM outperforms a baseline method on two datasets.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.