∞-MoE: Generalizing Mixture of Experts to Infinite Experts

Shota Takashiro

6:40 PM - 7:00 PM

[3S6-GS-2-04] ∞-MoE: Generalizing Mixture of Experts to Infinite Experts

〇Shota Takashiro¹, Takeshi Kojima¹, Shohei taniguchi¹, Yusuke Iwasawa¹, Yutaka Matsuo¹ (1. University of Tokyo)

Keywords:Large Language Model, Mixture of Experts, Pruning

We propose a novel framework called Infinite Mixture of Experts ($\infty$-MoE) that generalizes Mixture of Experts (MoE) from a finite number of experts to a (theoretically) infinite continuum. Traditional MoE architectures have achieved significant expressiveness by combining multiple discrete experts, but they face inherent limitations as the number and structure of experts are predetermined. To address this, our $\infty$-MoE represents experts in a continuous space, allowing the model to dynamically sample and activate an unbounded set of experts for each input. Experimental results on GPT-2 Small/Medium models demonstrate that $\infty$-MoE outperforms Dense, Switch Transformer, and standard MoE (Top-2 gating). We discuss the potential of $\infty$-MoE for more expressive model architectures and outline possible extensions to larger-scale models and multimodal tasks.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3S6-GS-2] Machine learning:

[3S6-GS-2-04] ∞-MoE: Generalizing Mixture of Experts to Infinite Experts

Password