Tag: Reasoning
All the articles with the tag "Reasoning".
-
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
本文通过探索离线强化学习方法(LD-DPO),在DeepDistill-32B模型上实现了平均3.3%的推理性能提升,尤其在Arena-Hard基准上提升10.1%,并强调了推理长度与语义丰富性平衡的重要性。
-
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.
-
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZEROSEARCH introduces a reinforcement learning framework that enhances LLMs' search capabilities by simulating search engines with fine-tuned LLMs, achieving performance comparable to or better than real search engines without API costs through a curriculum-based rollout strategy.
-
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models
本文综述了DeepSeek-R1发布后100天内推理语言模型的复制研究,系统总结了监督微调和基于可验证奖励的强化学习方法在数据构建和算法设计上的进展,并探讨了推理能力提升的多方向应用。
-
RM-R1: Reward Modeling as Reasoning
本文提出RM-R1,一种通过将奖励建模转化为推理任务并结合蒸馏和强化学习训练的推理奖励模型(REASRMS),在多个基准测试上取得了最先进性能,同时显著提升了可解释性。