Tag: Reasoning

All the articles with the tag "Reasoning".

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Published: 8 May, 2025 at 06:16 PM

86.09 🤔

ZEROSEARCH introduces a reinforcement learning framework that enhances LLMs' search capabilities by simulating search engines with fine-tuned LLMs, achieving performance comparable to or better than real search engines without API costs through a curriculum-based rollout strategy.
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Published: 21 May, 2025 at 11:29 AM

86.09 🤔

本文提出两种测试时计算扩展算法（淘汰赛式和联赛式），通过生成多个候选解决方案并进行成对比较，在理论上证明其失败概率随计算资源增加呈指数或幂律下降，并在多个数据集和模型上验证了性能提升。
Thinker: Learning to Think Fast and Slow

Published: 31 May, 2025 at 11:16 AM

86.01 🤔

本文提出Thinker任务，通过将问答过程分解为快速思考、验证、慢速思考和总结四个阶段，利用强化学习针对性训练大型语言模型的直觉和推理能力，在数学推理基准上实现了显著性能提升。
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Published: 20 May, 2025 at 11:10 AM

85.99 🤔

本文提出 S-GRPO 方法，通过串行组生成和递减奖励策略调控大型语言模型中间推理过程，在多个基准数据集上实现推理长度减少 35.4%~61.1% 和准确率提升 0.72%~6.08%，显著提升推理效率。
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

Published: 2 Jun, 2025 at 11:33 AM

85.98 🤔

本文作为立场论文，主张强化微调（RFT）通过强化学习算法显著提升多模态大语言模型（MLLMs）的推理能力，总结了社区在多模态、任务和领域上的进展，并提出了五个未来研究方向，但缺乏具体方法创新和实验验证。

Tag: Reasoning

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Thinker: Learning to Think Fast and Slow

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models