Redistribution of Token-wise Relevance in Transformer Models

Haruto Sawaki

9:40 AM - 10:00 AM

[4I1-GS-11-03] Redistribution of Token-wise Relevance in Transformer Models

Model Interpretability Based on Layer-wise Relevance Propagation

〇Haruto Sawaki¹, Noboru Murata¹ (1. Waseda university)

Keywords:Explainability, Transformer, XAI, Layer-wise Relevance Propagation

The attention mechanism in Transformer models processes information by integrating intermediate features using the relevance between tokens as weights. Consequently, the attention weights in the final layer are often employed as an explanation of the model's behavior. However, this approach fails to appropriately account for the impact of pattern transformations between tokens across layers, making it insufficient for accurately evaluating the contributions of input tokens and limiting interpretability. In this study, we propose a method that extends the concept of Layer-wise Relevance Propagation (LRP) to redistribute the relevance of pattern transformations in the attention mechanism among tokens and propagate it back to the input tokens. Unlike the direct observation of attention weights, this method enables the quantification of the influence of input tokens before the pattern transformations take place. By providing a framework for accurately capturing the impact of information derived from the input and intuitively understanding the internal operations of the model, the proposed approach enhances the interpretability and transparency of Transformer models and is expected to contribute to improving their reliability in practical applications.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4I1-GS-11] AI and Society:

[4I1-GS-11-03] Redistribution of Token-wise Relevance in Transformer Models

Model Interpretability Based on Layer-wise Relevance Propagation

Password