基本信息 - Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

arxiv Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

阅读

Star 0

名称: Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model

首页: https://yiyibooks.cn/arxiv/2503.24290v1/index.html

原始地址: https://arxiv.org/abs/2503.24290

描述

我们介绍了开放式 - 季季者零，这是大规模推理的RL培训的第一个开源实施，重点是可扩展性，简单性和可访问性。通过广泛的实验，我们证明了一种极简主义的方法，具有GAE的Vanilla PPO（$ \ \ lambda = 1 $，$ \ gamma = 1 $）和基于直接规则的奖励，而没有任何KL正则化，足以扩大响应长度和基准测试长度和基准性能，类似于在DeepeMeek-Zereek-Zero中相似的现象。使用与DeepSeek-R1-Zero-QWEN-32B相同的基本模型，我们的实现在AIME2024，MATH500和GPQA DIAMOND基准测试中实现了卓越的性能，同时表现出了显着的效率 - 与DeepSeek-R1-Zero-Zero-Zero-Zero Pipeline相比，仅需十分之一的训练步骤 ...

文件上传进度

0%

上传成功 0 个文件