JSAI2025

Presentation information

General Session

General Session » GS-5 Language media processing

[3G1-GS-6] Language media processing:

Thu. May 29, 2025 9:00 AM - 10:40 AM Room G (Room 1002)

座長:高村 大也(産業技術総合研究所)

10:00 AM - 10:20 AM

[3G1-GS-6-04] Development of a Large Language Model Emphasizing Japanese Dialogue and Text Generation Performance

Report on “Tanuki” LLM Development Project Through Public Recruitment and Open Collaboration

○Katsuhiko Nishizawa1, Kan Hatakeyama2, Takao Mori3, Minami Someya 4,Yasushi Nishijima, Kazutaka Nishimae5, Susumu Ota6, Keno Harada7, Yohei Kobashi7, Takeshi Kojima7, Yusuke Iwasawa7, Yutaka Matsuo7 (1. Panasonic Holdings Corporation, 2. Institute of Science Tokyo, 3. Denso Corporation, 4. INSTITUTE of INFORMATION SECURITY, 5. Cross-Industrial Data Science Laboratories, 6. Tokyo University of Technology, 7. The University of Tokyo)

Keywords:LLM, Synthetic data, GENIAC

In recent years, large language models (LLM) have been advancing rapidly worldwide, emphasizing the growing importance of cultivating capabilities within Japan. This paper presents an LLM development project led by the Matsuo and Iwasawa Lab as part of the GENIAC project, whose primary goal is to foster domestic expertise and reinforce national development capacity. Volunteers from the public worked with the lab to create 8B and 8×8B models from scratch. When we began our research in April 2024, domestically developed models still faced certain challenges in dialogue and text generation. On the other hand, our approach focused on improving dialogue and composition through synthetic data. Evaluations using the widely recognized “Japanese MT-Bench” indicated that our 8B model surpassed existing 10B-class models, while our 8×8B model performed on par with GPT-3.5, placing it at the forefront among domestically developed LLMs. Both models and their training code have been released under the Apache License 2.0, contributing to academic research and industrial applications of Japanese LLMs.

Please log in with your participant account.
» Participant Log In