JSAI2025

Presentation information

General Session

General Session » GS-5 Language media processing

[3G1-GS-6] Language media processing:

Thu. May 29, 2025 9:00 AM - 10:40 AM Room G (Room 1002)

座長:高村 大也(産業技術総合研究所)

9:20 AM - 9:40 AM

[3G1-GS-6-02] Learning Task Vector Weight Coefficients Based on Small-Scale Data in Large Language Model Merging

〇Seong Cheol Jeong1, Masahiro Suzuki1, Yutaka Matsuo1 (1. The University of Tokyo)

Keywords:Model merging, Task Vector, Weight Coefficient Optimization

This research proposes LLM-AdaMerge, a method that extends AdaMerging to efficiently merge multiple specialized Large Language Models (LLMs). While existing LLM merging methods typically avoid using training data due to computational costs, leading to suboptimal task interactions, our approach enables data-driven optimization with efficient computational overhead. We introduce a language modeling loss function that directly optimizes weight coefficients for combining task-specific parameter differences, requiring only 4 samples per task for effective training.
Experiments with three specialized Mistral-7B based models (mathematics, code generation, and Japanese language) demonstrate that our method achieves up to 12.95 points improvement in average accuracy compared to baselines. The results show superior performance over both non-data-driven methods and Bayesian optimization approaches, while maintaining computational efficiency through weight-only updates. Our method provides a practical solution for combining multiple specialized LLMs, though scaling to larger numbers of tasks remains challenging.

Please log in with your participant account.
» Participant Log In