A Proposal of Video Key-frame Captioning Task and its Dataset Construction

Kotaro Kitayama

4:20 PM - 4:40 PM

[4I4-GS-7e-03] A Proposal of Video Key-frame Captioning Task and its Dataset Construction

〇Kotaro Kitayama¹, Jun Suzuki^1,2, Nobuyuki Shimizu³ (1. Tohoku University, 2. RIKEN, 3. Yahoo Japan Corporation)

Keywords:CV, NLP

Automatic video summarization is one of the crucial technologies to alleviate the cost of developers and end-usersto check the contents of videos. Moreover, it can also work as clues of video retrieval to only obtain required videosfrom extremely many consumer-generated videos. This paper specifically focuses on a video summarization task,which we callvideo key-frame captioning. This task requires systems to extract a predefined number of key-framesand simultaneously generate a description of the series of extracted key-frames that summarize the given video well.We introduce a formal task definition of our new task and discuss procedures for creating a dataset for evaluationof key-frame captioning tasks.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4I4-GS-7e] 画像音声メディア処理：応用

[4I4-GS-7e-03] A Proposal of Video Key-frame Captioning Task and its Dataset Construction

Password