JSAI2025

Presentation information

General Session

General Session » GS-2 Machine learning

[1S4-GS-2] Machine learning:

Tue. May 27, 2025 3:40 PM - 5:20 PM Room S (Room 701-2)

座長:高橋 大志(NTT)

3:40 PM - 4:00 PM

[1S4-GS-2-01] TELU: A Faster Alternative to GELU and Swish

〇Shohei Taniguchi1, Yutaka Matsuo1 (1. The University of Tokyo)

Keywords:activation function

In recent deep learning models, smooth activation functions such as GELU and Swish are widely used instead of ReLU. Such activation functions are known to have advantages over ReLU in terms of robustness to noise, etc., but they are slow because they involve the computation of transcendental functions such as Gaussian error functions and sigmoid functions. In this study, we propose a faster and smoother activation function, the T Error Linear Unit (TELU), which can be computed using only algebraic functions and is faster than GELU and other functions, while maintaining the smoothness of the function. Experimental results show that TELU can replace GELU in the pre-training of GPT-2, and that TELU is faster than GELU while maintaining high performance.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password