JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I1-GS-7] Language media processing:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room I (Room 41)

座長:石川 開(日本電気株式会社)[[オンライン]]

9:20 AM - 9:40 AM

[4I1-GS-7-02] Pretraining of Multi-Aspect Ratio Vision Transformer and Its Application to Advertising Effects Prediction

〇Naoto Tanji1, Toshihiko Yamasaki2 (1. Septeni Japan, Inc., 2. The University of Tokyo)

Keywords:deep learning, computer vision, online advertisements, Transformer

For the creation of effective online advertisements, predicting their impact before distribution is beneficial. Display advertisements on the web have diverse aspect ratios, and altering these ratios can affect the impression these images make on viewers. Therefore, it is important to retain the aspect ratio information for accurate prediction of advertisement effectiveness. In this study, we developed an image recognition model specifically for advertisement images by pretraining a Vision Transformer model capable of handling images of any aspect ratios, using the Masked Autoencoder method. By utilizing Rotary Position Embedding and Flash Attention techniques, we obtained a model with high flexibility regarding input image sizes. We also present the results of applying this pretrained model to an advertising effects prediction task using real-world advertisement distribution data.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password