Tag: Reasoning

All the articles with the tag "Reasoning".

Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

Published: 7 May, 2025 at 08:41 AM

86.49 🤔

本文通过探索离线强化学习方法（LD-DPO），在DeepDistill-32B模型上实现了平均3.3%的推理性能提升，尤其在Arena-Hard基准上提升10.1%，并强调了推理长度与语义丰富性平衡的重要性。
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Published: 1 Jun, 2025 at 11:53 AM

86.49 🤔

本文提出分块训练（CWT）和跳跃思维训练（STT），通过将推理过程分块并跳过非核心块，显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Published: 26 May, 2025 at 11:22 AM

86.44 🤔

本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响，揭示任务依赖性性能降解，并提出ShotKV方法，通过区分预填充和解码阶段压缩策略，在长上下文生成任务上显著提升性能。
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Published: 3 Jun, 2025 at 11:30 AM

86.44 🤔

本文通过质疑‘aha moment’模式与推理能力提升的相关性，提出了一种结合监督微调（SFT）和强化学习（RL）的两阶段方法，在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能，达到开源模型中的最优水平。
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Published: 10 May, 2025 at 10:59 AM

86.42 🤔

This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.

Tag: Reasoning

Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute