Tag: Reasoning

All the articles with the tag "Reasoning".

First Finish Search: Efficient Test-Time Scaling in Large Language Models

Published: 1 Jun, 2025 at 11:52 AM

87.92 🤔

本文提出First Finish Search (FFS)，一种无需训练的测试时扩展策略，通过并行解码并选择最先完成的推理轨迹，在推理任务上显著提升大型语言模型准确率（如DeepSeek-R1在AIME数据集达82.23%），同时减少高达45%的令牌使用量。
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Published: 30 May, 2025 at 11:16 AM

87.82 🤔

本文通过实验验证了长上下文能力与推理性能的正相关，提出在监督微调前增强长上下文能力的训练策略，并在数学推理基准上显著提升了模型性能。
Thought calibration: Efficient and confident test-time scaling

Published: 28 May, 2025 at 11:22 AM

87.79 🤔

本文提出‘思想校准’方法，通过推理树抽象和轻量级探针动态决定语言模型推理终止时机，在分布内数据上减少高达60%的思考token，同时保持性能，并在分布外数据上实现20%的减少。
Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection

Published: 28 May, 2025 at 11:21 AM

87.78 🤔

本文提出Route-To-Reason（RTR）框架，通过动态路由机制联合选择最优模型和推理策略，在多个推理任务上实现了更高的准确率和超过60%的token使用量减少，显著优化了性能与成本的权衡。
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs

Published: 21 May, 2025 at 11:14 AM

87.75 🤔

本文提出了一种动态自适应的混合训练框架 SASR，通过基于梯度范数和 KL 散度的动态调整机制结合 SFT 和 RL，在数学推理和逻辑推理任务上显著提升了大语言模型的性能，优于传统 SFT、RL 和静态混合方法。

Tag: Reasoning

First Finish Search: Efficient Test-Time Scaling in Large Language Models

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

Thought calibration: Efficient and confident test-time scaling

Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection

Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs