Analysis of Sparse Autoencoder features focusing on topics in sentences

Yume Kato; Ichiro Kobayashi

[1Win4-100] Analysis of Sparse Autoencoder features focusing on topics in sentences

〇Yume Kato¹, Ichiro Kobayashi¹ (1.Ochanomizu University)

Keywords:Sparse AutoEncoder, Large Language Model, Topic Analysis

With large-scale language models (LLMs) development, their interpretability has become an important research topic. In particular, a method that decomposes the compressed representations in the middle layers of LLMs into a higher-dimensional interpretable form using a sparse autoencoder (SAE) has attracted attention. This study aims to analyze how LLMs internally represent differences in topics in texts by utilizing features obtained from sparse autoencoders trained in various layers of the model.
We performed clustering based on feature co-occurrence information using spectral clustering and found that the clustering based on feature co-occurrence shows a cohesive set when the features are mapped to a two-dimensional plane using UMAP.
However, we found that the correspondence between the clustering based on feature co-occurrence and topic labels is limited.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1Win4] Poster session 1

[1Win4-100] Analysis of Sparse Autoencoder features focusing on topics in sentences

Password