[1Win4-18] Internal Representations of Familiarity Judgments in Language Models
Keywords:language models, knowledge representation, familiarity judgement
The knowledge acquisition capabilities of language models (LMs) have been extensively studied; however, the mechanisms by which LMs judge the familiarity of acquired knowledge remain insufficiently understood. In this study, we employ a LM to perform an analysis of their internal states during familiarity judgment. Our findings reveal that (1) the information required to judge familiarity is embedded within the internal representations at the time the knowledge is learned, and (2) it exhibits different activation patterns when predicting knowledge as familiar versus unfamiliar. These findings provide insights into the mechanisms underlying familiarity judgment in language models.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.