9:00 AM - 9:20 AM
[4I1-GS-7b-01] Building a Video-and-Language Dataset with Human Actions for Multimodal Inference
Keywords:Multimodal Inference, Visual Textual Entailment, Video Dataset
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.