Tag: Efficiency

All the articles with the tag "Efficiency".

Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation

Published: 21 May, 2025 at 11:09 AM

86.58 🤔

本文通过混合高斯模拟和大规模语言模型实验，揭示了知识蒸馏在生成模型中通过教师模型熵控制学生模型精度-召回权衡的机制，从而提升样本质量。
ThinkSwitcher: When to Think Hard, When to Think Fast

Published: 24 May, 2025 at 11:12 AM

86.56 🤔

ThinkSwitcher通过一个轻量级自适应框架，使单一大型推理模型根据任务复杂性动态切换长短链式推理模式，在数学推理基准上减少20-30%计算成本，同时在复杂任务上保持较高准确率。
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

Published: 7 May, 2025 at 08:41 AM

86.49 🤔

本文通过探索离线强化学习方法（LD-DPO），在DeepDistill-32B模型上实现了平均3.3%的推理性能提升，尤其在Arena-Hard基准上提升10.1%，并强调了推理长度与语义丰富性平衡的重要性。
Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Published: 1 Jun, 2025 at 11:53 AM

86.49 🤔

本文提出分块训练（CWT）和跳跃思维训练（STT），通过将推理过程分块并跳过非核心块，显著提升小型语言模型在链式思维蒸馏中的推理准确性和速度。
Superposition Yields Robust Neural Scaling

Published: 17 May, 2025 at 11:17 PM

86.47 🤔

本文通过玩具模型和实际LLMs分析，揭示了超位置作为神经扩展律的重要机制，在强超位置下损失与模型维度成反比，与特征频率分布无关，从而解释了损失随模型规模幂律下降的现象。

Tag: Efficiency

Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation

ThinkSwitcher: When to Think Hard, When to Think Fast

Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

Skip-Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster

Superposition Yields Robust Neural Scaling