Tag: Reasoning

All the articles with the tag "Reasoning".

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

Published: 4 Jun, 2025 at 11:28 AM

85.25 🤔

本文提出强化蒸馏（REDI）框架，通过两阶段训练利用正向和负向推理轨迹，显著提升小型语言模型的数学推理性能，Qwen-REDI-1.5B在公开数据上达到1.5B模型的最新水平。
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Published: 30 May, 2025 at 11:13 AM

85.25 🤔

本文提出R2R，一种令牌级别的神经路由方法，通过选择性使用LLM修正SLM推理路径中的分歧令牌，在平均激活参数5.6B下超越R1-14B模型性能，并比R1-32B实现2.8倍墙钟加速。
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Published: 3 Jun, 2025 at 11:42 AM

85.21 🤔

本文通过ComPABench基准评估视觉-语言模型（VLMs）的组合推理能力，发现强化学习（RL）优于监督微调（SFT）在跨任务和分布外泛化中的表现，并提出RL-Ground方法显著提升多模态组合推理性能。
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Published: 2 Jun, 2025 at 11:31 AM

85.18 🤔

本文提出PURE框架，通过最小形式信用分配方法利用过程奖励改进大型语言模型的推理能力，实验显示其在数学推理任务上与可验证奖励方法性能相当，且结合少量地面真实信号可进一步提升准确率至53.3%。
Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Published: 3 Jun, 2025 at 11:29 AM

85.16 🤔

本文通过调整初始化率和权重衰减系数控制大语言模型复杂性，显著提升推理能力，尤其在数学任务上表现突出，并在扩展律上展现更优性能。

Tag: Reasoning

Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning

R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Scalable Complexity Control Facilitates Reasoning Ability of LLMs