JSAI2025

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[3N1-GS-7] Vision, speech media processing:

Thu. May 29, 2025 9:00 AM - 10:40 AM Room N (Room 1009)

座長:田崎 豪(名城大学)[[オンライン]]

9:40 AM - 10:00 AM

[3N1-GS-7-03] Few-shot Video Summarization Utilizing Large Language Models

〇Tomoya Sugihara1, Shuntaro Masuda1, Ling Xiao1, Toshihiko Yamasaki1 (1. Univ. of Tokyo)

Keywords:Video summarization, Large language models, Few-shot reasoning

Conventional supervised video summarization methods aggregate annotations from multiple annotators to create ground truth labels for model training. However, this approach often introduces noisy ground truth labels due to the association of multiple labels with a single video, potentially degrading model performance. Additionally, the small datasets further increase the risk of overfitting to specific categories. In contrast, large language models (LLMs) have recently demonstrated remarkable few-shot reasoning capabilities. These capabilities allow them to adapt to tasks with only a few task examples provided as prompts. Building on this, we propose a novel few-shot video summarization method. This method leverages the few-shot reasoning capabilities of LLMs to learn annotator-specific summarization tendencies from limited labeled data. Specifically, we utilize a pre-trained image captioning model to transform videos into textual data. The generated captions are paired with corresponding annotated labels to construct few-shot prompts. Using these few-shot prompts, the LLM performs frame-level scoring without requiring parameter updates. Experimental evaluations on the SumMe and TVSum datasets show that the proposed method outperforms random scoring method in F-score. These results highlight the effectiveness of our method in few-shot video summarization tasks.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password