基本信息 - Uncertainty-Aware Step-wise Verification with Generative Reward Models

arxiv Uncertainty-Aware Step-wise Verification with Generative Reward Models

阅读

Star 0

名称: Uncertainty-Aware Step-wise Verification with Generative Reward Models

首页: https://yiyibooks.cn/arxiv/2502.11250v1/index.html

原始地址: https://arxiv.org/abs/2502.11250

描述

对于大型语言模型（LLM），复杂的多步推理任务（例如解决数学问题）仍然具有挑战性。尽管通常使用结果监督，但通过过程奖励模型（PRMS）的过程监督提供了中间奖励，以验证解决方案轨迹中的逐步正确性。但是，作为人类判断的代理人，PRMS遭受了可靠性问题，包括奖励黑客的敏感性 ...

0%

上传成功 0 个文件