Internal Representations of Familiarity Judgments in Language Models

Kai Sato; Ryosuke Takahashi; Benjamin Heinzerling; Kenshiro Tanaka; Yufeng Zhao; Yoshihiro Sakai; Naoya Inoue; Inui Kentaro

[1Win4-18] Internal Representations of Familiarity Judgments in Language Models

〇Kai Sato¹, Ryosuke Takahashi¹, Benjamin Heinzerling^2,1, Kenshiro Tanaka³, Yufeng Zhao³, Yoshihiro Sakai ³, Naoya Inoue^3,2, Inui Kentaro^4,1,2 (1.Tohoku University, 2.Institute of Physical and Chemical Research, 3.Japan Advanced Institute of Science and Technology, 4.MBZUAI)

Keywords:language models, knowledge representation, familiarity judgement

The knowledge acquisition capabilities of language models (LMs) have been extensively studied; however, the mechanisms by which LMs judge the familiarity of acquired knowledge remain insufficiently understood. In this study, we employ a LM to perform an analysis of their internal states during familiarity judgment. Our findings reveal that (1) the information required to judge familiarity is embedded within the internal representations at the time the knowledge is learned, and (2) it exhibits different activation patterns when predicting knowledge as familiar versus unfamiliar. These findings provide insights into the mechanisms underlying familiarity judgment in language models.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1Win4] Poster session 1

[1Win4-18] Internal Representations of Familiarity Judgments in Language Models

Password