[3Yin2-31] Open-Domain Live Commentary Generation
Keywords:Natural language processing, Multimodal
In a live commentary, a commentator gives objective statements or subjective comments about the events in a video in real-time.
The research about the automatic generation of such live commentary has been conventionally carried out for specific fields, such as sports, so it is common to use field-specific information to generate live commentaries.
The subject of our study, the live commentary generation for open-domain videos cannot use domain-specific features, which makes it a difficult setting.
We first construct a dataset with videos from various domains and live commentaries collected by crowdsourcing, then train a live commentary generation model that takes into account video and context.
Experiments show that a multimodal Transformer that considers video and contextual text performs well.
The research about the automatic generation of such live commentary has been conventionally carried out for specific fields, such as sports, so it is common to use field-specific information to generate live commentaries.
The subject of our study, the live commentary generation for open-domain videos cannot use domain-specific features, which makes it a difficult setting.
We first construct a dataset with videos from various domains and live commentaries collected by crowdsourcing, then train a live commentary generation model that takes into account video and context.
Experiments show that a multimodal Transformer that considers video and contextual text performs well.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.