arxiv Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

名称
Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models
首页
https://yiyibooks.cn/arxiv/2508.15202v1/index.html
原始地址
https://arxiv.org/abs/2508.15202
描述
Process Reward Models (PRMs) have emerged as a promising framework for supervising intermediate reasoning in large language models (LLMs), yet existing PRMs are primarily trained on general or Science, Technology, Engineering, and Mathematics (STEM) domains and fall short in domain-specific contexts such as finance, where reasoning is more structured, symbolic, and sensitive to factual and regulatory correctness.我们介绍了\ TextBf {Fin-Prm},这是一种针对评估财务任务中的中间推理步骤量身定制的域特有的,轨迹感知的PRM。 Fin-Prm整合了阶梯级和轨迹级别的奖励监督,从而使对与财务逻辑一致的推理痕迹进行精细评估 ...