JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2C6-GS-7] Language media processing:

Wed. May 29, 2024 5:30 PM - 7:10 PM Room C (Temporary room 1)

座長:寺下直行(株式会社日立製作所)

5:50 PM - 6:10 PM

[2C6-GS-7-02] Improving Saliency Map Prediction via Changing Backbone for Advertising Videos

taro watanabe1, 〇kazuhiro onishi1 (1. Hakuhodo Technologies Inc.)

Keywords:saliency map, action recognition model, advertising videos

For creation phase of the advertisement, feedback on which parts of the advertisement videos is mainly obtaining the viewers’ attention is important for leading to more efficient production of advertisement videos. We will improve the performance by replacing the backbone of the encoder part with a better-performing action recognition model to a UNet-like encoder-decoder structure. We selected six different action recognition models (S3D, Slow, X3D, Slowfast, MoViNet, and Uniformer) and evaluated their estimation accuracy using three different benchmarks. No correlation was found between the classification accuracy and saliency prediction accuracy for the action recognition models. We found improvement for small areas and low-contrast regions, but not much improvement when object motion prediction was still required.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password