Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Published: 3 Jun, 2025 at 11:29 AM

87.38 🤔

本文通过控制实验研究SFT和RL在增强LLM推理能力中的相互作用，发现短CoT预热对RL有中等贡献，回溯次数需与任务难度匹配，且RL对SFT数据正确性依赖较小而对结构一致性敏感。
Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Published: 30 May, 2025 at 11:13 AM

95.59 🤔

本文通过仅使用920个蒸馏样本对Qwen2.5-32B基础模型进行监督微调，显著超越了资源密集的Zero-RL方法，并揭示了蒸馏模型通过拟人化语言和高级认知行为实现更灵活推理的机制。
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Published: 3 Jun, 2025 at 11:30 AM

86.44 🤔

本文通过质疑‘aha moment’模式与推理能力提升的相关性，提出了一种结合监督微调（SFT）和强化学习（RL）的两阶段方法，在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能，达到开源模型中的最优水平。
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Published: 3 Jun, 2025 at 11:42 AM

85.21 🤔

本文通过ComPABench基准评估视觉-语言模型（VLMs）的组合推理能力，发现强化学习（RL）优于监督微调（SFT）在跨任务和分布外泛化中的表现，并提出RL-Ground方法显著提升多模态组合推理性能。
RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs

Published: 1 Jun, 2025 at 11:51 AM

89.43 🤔

RaCT通过链式思维（CoT）提示和排序偏好优化（RPO）的两阶段训练框架，显著提升了大型语言模型在文本重排序任务中的性能，同时保留了其通用语言建模能力，在多个基准上超越基线模型。

Tag: Supervised Learning

How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

RaCT: Ranking-aware Chain-of-Thought Optimization for LLMs