Tag: Reasoning
All the articles with the tag "Reasoning".
-
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
ZEROSEARCH introduces a reinforcement learning framework that enhances LLMs' search capabilities by simulating search engines with fine-tuned LLMs, achieving performance comparable to or better than real search engines without API costs through a curriculum-based rollout strategy.
-
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models
本文提出两种测试时计算扩展算法(淘汰赛式和联赛式),通过生成多个候选解决方案并进行成对比较,在理论上证明其失败概率随计算资源增加呈指数或幂律下降,并在多个数据集和模型上验证了性能提升。
-
Thinker: Learning to Think Fast and Slow
本文提出Thinker任务,通过将问答过程分解为快速思考、验证、慢速思考和总结四个阶段,利用强化学习针对性训练大型语言模型的直觉和推理能力,在数学推理基准上实现了显著性能提升。
-
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
本文提出 S-GRPO 方法,通过串行组生成和递减奖励策略调控大型语言模型中间推理过程,在多个基准数据集上实现推理长度减少 35.4%~61.1% 和准确率提升 0.72%~6.08%,显著提升推理效率。
-
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models
本文作为立场论文,主张强化微调(RFT)通过强化学习算法显著提升多模态大语言模型(MLLMs)的推理能力,总结了社区在多模态、任务和领域上的进展,并提出了五个未来研究方向,但缺乏具体方法创新和实验验证。