[1Win4-100] Analysis of Sparse Autoencoder features focusing on topics in sentences
Keywords:Sparse AutoEncoder, Large Language Model, Topic Analysis
With large-scale language models (LLMs) development, their interpretability has become an important research topic. In particular, a method that decomposes the compressed representations in the middle layers of LLMs into a higher-dimensional interpretable form using a sparse autoencoder (SAE) has attracted attention. This study aims to analyze how LLMs internally represent differences in topics in texts by utilizing features obtained from sparse autoencoders trained in various layers of the model.
We performed clustering based on feature co-occurrence information using spectral clustering and found that the clustering based on feature co-occurrence shows a cohesive set when the features are mapped to a two-dimensional plane using UMAP.
However, we found that the correspondence between the clustering based on feature co-occurrence and topic labels is limited.
We performed clustering based on feature co-occurrence information using spectral clustering and found that the clustering based on feature co-occurrence shows a cohesive set when the features are mapped to a two-dimensional plane using UMAP.
However, we found that the correspondence between the clustering based on feature co-occurrence and topic labels is limited.
Please log in with your participant account.
» Participant Log In