arxiv Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't

名称
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't
首页
https://yiyibooks.cn/arxiv/2503.16219v1/index.html
原始地址
https://arxiv.org/pdf/2503.16219
描述
增强大语言模型(LLM)的推理功能通常依赖于大量的计算资源和广泛的数据集,从而限制了资源约束设置的可访问性。我们的研究调查了加固学习的潜力(RL)改善小型LLM的推理,重点是150亿个参数模型DeepSeek-R1-Distill-Qwen-1 ...