Tag: Reasoning
All the articles with the tag "Reasoning".
-
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study
本文通过探索离线强化学习方法(LD-DPO),在DeepDistill-32B模型上实现了平均3.3%的推理性能提升,尤其在Arena-Hard基准上提升10.1%,并强调了推理长度与语义丰富性平衡的重要性。
-
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
本文提出分块训练(CWT)和跳跃思维训练(STT),通过将推理过程分块并跳过非核心块,显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
-
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响,揭示任务依赖性性能降解,并提出ShotKV方法,通过区分预填充和解码阶段压缩策略,在长上下文生成任务上显著提升性能。
-
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
本文通过质疑‘aha moment’模式与推理能力提升的相关性,提出了一种结合监督微调(SFT)和强化学习(RL)的两阶段方法,在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能,达到开源模型中的最优水平。
-
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.