Tag: Exploration

All the articles with the tag "Exploration".

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs

Published: 30 May, 2025 at 11:16 AM

87.61 🤔

本文提出动态采样预算分配和温度调度机制，通过基于问题难度的资源再分配和维持策略熵的探索能力，显著提升了大型语言模型在数学任务中的强化学习效率和性能，尤其在AIME 2024基准上pass@1和pass@16分别提高5.31%和3.33%。