Posts
All the articles I've posted.
-
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster
本文提出分块训练(CWT)和跳跃思维训练(STT),通过将推理过程分块并跳过非核心块,显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
-
Superposition Yields Robust Neural Scaling
本文通过玩具模型和实际LLMs分析,揭示了超位置作为神经扩展律的重要机制,在强超位置下损失与模型维度成反比,与特征频率分布无关,从而解释了损失随模型规模幂律下降的现象。
-
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响,揭示任务依赖性性能降解,并提出ShotKV方法,通过区分预填充和解码阶段压缩策略,在长上下文生成任务上显著提升性能。
-
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start
本文通过质疑‘aha moment’模式与推理能力提升的相关性,提出了一种结合监督微调(SFT)和强化学习(RL)的两阶段方法,在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能,达到开源模型中的最优水平。
-
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute
This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.