A Study on Information Security Management Techniques for Extracting Personally Identifiable Information in Internal Documents Using Small Language Models

Tomoyuki Yamaguchi; Tomu Noguchi

[1Win4-91] A Study on Information Security Management Techniques for Extracting Personally Identifiable Information in Internal Documents Using Small Language Models

〇Tomoyuki Yamaguchi¹, Tomu Noguchi¹ (1.Murata Manufacturing Co., Ltd.)

Keywords:NER, NLP

With the rapid development and adoption of Large Language Models (LLMs), there is a growing expectation to leverage corporate internal documents for text generation and information retrieval to boost efficiency. However, many high-performing LLMs are accessed via provider APIs, posing a risk of exposing input data externally, which raises concerns related to privacy regulations and corporate compliance. To safely implement LLMs in an enterprise, it is essential to manage highly confidential input data in an appropriate manner. This research aims to develop a security management method that extracts and masks personally identifiable information (PII) contained in the input submitted to API-based LLMs. We fine-tuned a Small Language Model (SLM), running locally, using a PII dataset built within the organization and employed Parameter-Efficient Fine-Tuning (PEFT) to evaluate its performance. Our results show that SLM-based PII extraction and masking can achieve sufficient accuracy for secure corporate use.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1Win4] Poster session 1

[1Win4-91] A Study on Information Security Management Techniques for Extracting Personally Identifiable Information in Internal Documents Using Small Language Models

Password