[1Win4-18] Internal Representations of Familiarity Judgments in Language Models
Keywords:language models, knowledge representation, familiarity judgement
The knowledge acquisition capabilities of language models (LMs) have been extensively studied; however, the mechanisms by which LMs judge the familiarity of acquired knowledge remain insufficiently understood. In this study, we employ a LM to perform an analysis of their internal states during familiarity judgment. Our findings reveal that (1) the information required to judge familiarity is embedded within the internal representations at the time the knowledge is learned, and (2) it exhibits different activation patterns when predicting knowledge as familiar versus unfamiliar. These findings provide insights into the mechanisms underlying familiarity judgment in language models.
Please log in with your participant account.
» Participant Log In