Developing a Process Reward Model for Agent Tasks and Evaluating Its Effectiveness in Exploration Methods

Yuya Imai

3:20 PM - 3:40 PM

[4J3-GS-5-05] Developing a Process Reward Model for Agent Tasks and Evaluating Its Effectiveness in Exploration Methods

〇Yuya Imai¹, Kotaro Sakamoto¹, Takeshi Kojima¹, Yusuke Iwasawa¹, Yutaka Matsuo¹ (1. The University of Tokyo)

Keywords:LLM, Agent, Reward Model

In recent years, improvements in large language models (LLMs) have propelled their use as agents that interact with environments via external tools, demanding ever greater performance. To address this need, we apply a Process Reward Model (PRM), which assigns rewards at each reasoning step, to the WebShop agent task. By integrating PRM with Beam Search, we demonstrate improved task-solving accuracy. Moreover, compared with another approach (Majority Voting) at the same computational cost, the PRM-based method consistently delivers higher accuracy and stability, underscoring the effectiveness of PRM-guided exploration in agent tasks. These findings suggest a promising direction for further performance gains.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4J3-GS-5] Agents:

[4J3-GS-5-05] Developing a Process Reward Model for Agent Tasks and Evaluating Its Effectiveness in Exploration Methods

Password