Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I4-GS-7e] 画像音声メディア処理:応用

Fri. Jun 11, 2021 3:40 PM - 5:20 PM Room I (GS room 4)

座長:岩澤 有祐(東京大学)

4:20 PM - 4:40 PM

[4I4-GS-7e-03] A Proposal of Video Key-frame Captioning Task and its Dataset Construction

〇Kotaro Kitayama1, Jun Suzuki1,2, Nobuyuki Shimizu3 (1. Tohoku University, 2. RIKEN, 3. Yahoo Japan Corporation)

Keywords:CV, NLP

Automatic video summarization is one of the crucial technologies to alleviate the cost of developers and end-usersto check the contents of videos. Moreover, it can also work as clues of video retrieval to only obtain required videosfrom extremely many consumer-generated videos. This paper specifically focuses on a video summarization task,which we callvideo key-frame captioning. This task requires systems to extract a predefined number of key-framesand simultaneously generate a description of the series of extracted key-frames that summarize the given video well.We introduce a formal task definition of our new task and discuss procedures for creating a dataset for evaluationof key-frame captioning tasks.

