6:20 PM - 6:40 PM
[3S6-GS-2-03] Continuous Japanese Pre-Training for Qwen2.5-32B/7B
Keywords:AI, LLM, Generative AI
In this study, we conducted continuous pre-training focused on Japanese for the Qwen model series developed by Alibaba Cloud, namely “Qwen2.5-32B-Instruct” and “Qwen2.5-7B-Instruct,” and evaluated their effectiveness on Japanese tasks. To balance high performance with feasible parameter sizes for real-world applications, we conducted continuous pre-training using a mixed dataset of roughly 100 billion tokens in both Japanese and English. We further applied a merging approach via ChatVector to enhance instruction-following capabilities. Evaluations using MT-Bench-Japanese and ELYZA-tasks-100 showed that the 32B model achieved scores of 8.294 and 4.37, respectively, demonstrating competitiveness comparable to closed large language models. The combined benchmark results even surpassed those of Qwen2.5-72B-Instruct, confirming the benefits of Japanese-focused continuous pre-training. On the other hand, some outputs still contain Chinese texts, suggesting potential influence from ChatVector or training data of base model. Future work will include removing mixed-language data and applying both domain- and task-specific post-training to further improve performance and address these issues.
Please log in with your participant account.
» Participant Log In