Presentation information

Organized Session

Organized Session » OS-3

[3J4-OS-3b] AutoML(自動機械学習)(2/2)

Thu. Jun 16, 2022 3:30 PM - 5:10 PM Room J (Room J)

オーガナイザ:大西 正輝(産業技術総合研究所)[現地]、日野 英逸(統計数理研究所/理化学研究所)

4:30 PM - 4:50 PM

[3J4-OS-3b-04] Exploring Token-Mixing Structure for Transformer

〇Takuya Asakura1, Kuniaki Uto1, Koichi Shinoda1 (1. Tokyo Institute of Technology)

Keywords:Neural Architecture Search, Transformer, Multi Layer Perceptron

The Transformer model, which applies Channel-Mixing and Token-Mixing alternately to input data, has been developed for time-series data such as text and speech. Recent studies have shown that this model can also perform well image. Various improved models of transformers have been proposed for image processing, many of which have improved the structure of the fully connected layer, especially for Token-Mixing. However, these structures should be designed manually, which requires advanced knowledge about the characteristics of the target data. In this paper, we propose a method to automatically acquire Token-Mixing structures by learning the relationships between Tokens. In our experiments on the image classification tasks, the structure obtained by the proposed method achieves higher accuracy while having fewer parameters than the other Token-Mixing methods. We also visualized the Token-Mixing structures obtained by the proposed method, and observed that the proposed method tends to focus on spatially close Tokens.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.