Posts

All the articles I've posted.

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Published: 1 Jun, 2025 at 11:53 AM

86.49 🤔

本文提出分块训练（CWT）和跳跃思维训练（STT），通过将推理过程分块并跳过非核心块，显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
Superposition Yields Robust Neural Scaling

Published: 17 May, 2025 at 11:17 PM

86.47 🤔

本文通过玩具模型和实际LLMs分析，揭示了超位置作为神经扩展律的重要机制，在强超位置下损失与模型维度成反比，与特征频率分布无关，从而解释了损失随模型规模幂律下降的现象。
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Published: 26 May, 2025 at 11:22 AM

86.44 🤔

本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响，揭示任务依赖性性能降解，并提出ShotKV方法，通过区分预填充和解码阶段压缩策略，在长上下文生成任务上显著提升性能。
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Published: 3 Jun, 2025 at 11:30 AM

86.44 🤔

本文通过质疑‘aha moment’模式与推理能力提升的相关性，提出了一种结合监督微调（SFT）和强化学习（RL）的两阶段方法，在3B和7B规模的多模态大语言模型上显著提升了多模态推理性能，达到开源模型中的最优水平。
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Published: 10 May, 2025 at 10:59 AM

86.42 🤔

This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.

Posts

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Superposition Yields Robust Neural Scaling

Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute