JSAI2025

Presentation information

General Session

General Session » GS-2 Machine learning

[3S6-GS-2] Machine learning:

Thu. May 29, 2025 5:40 PM - 7:20 PM Room S (Room 701-2)

座長:渡邊 千紘(NTT)

6:40 PM - 7:00 PM

[3S6-GS-2-04] ∞-MoE: Generalizing Mixture of Experts to Infinite Experts

〇Shota Takashiro1, Takeshi Kojima1, Shohei taniguchi1, Yusuke Iwasawa1, Yutaka Matsuo1 (1. University of Tokyo)

Keywords:Large Language Model, Mixture of Experts, Pruning

We propose a novel framework called Infinite Mixture of Experts ($\infty$-MoE) that generalizes Mixture of Experts (MoE) from a finite number of experts to a (theoretically) infinite continuum. Traditional MoE architectures have achieved significant expressiveness by combining multiple discrete experts, but they face inherent limitations as the number and structure of experts are predetermined. To address this, our $\infty$-MoE represents experts in a continuous space, allowing the model to dynamically sample and activate an unbounded set of experts for each input. Experimental results on GPT-2 Small/Medium models demonstrate that $\infty$-MoE outperforms Dense, Switch Transformer, and standard MoE (Top-2 gating). We discuss the potential of $\infty$-MoE for more expressive model architectures and outline possible extensions to larger-scale models and multimodal tasks.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password