Posts

All the articles I've posted.

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Published: 4 May, 2025 at 04:29 PM

85.10 👍

论文通过大规模实验分析了Transformer LLMs中稀疏注意力的效率-准确性权衡，揭示了长序列下更大稀疏模型的优势，并建立了可推广的缩放定律。
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Published: 4 May, 2025 at 04:33 PM

83.56 👍

本文提出了一种多阶段训练方案，包括大规模蒸馏、滚动偏好优化和可验证奖励的强化学习，显著提升了小型语言模型在数学推理任务中的性能，使3.8B参数的Phi-4-Mini-Reasoning模型超过了近两倍参数的开源基线模型。
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Published: 4 May, 2025 at 04:27 PM

83.39 👍

本文系统揭示了自注意力模块中大规模值在LLM上下文知识理解中的关键作用，并通过实验证明其源于旋转位置编码（RoPE），为模型优化和量化策略提供新洞见。
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models

Published: 4 May, 2025 at 04:31 PM

82.99 👍

本文提出 Think, Prune, Train 框架，通过迭代监督微调和基于正确性的数据修剪，实现模型在不增加规模的情况下提升推理能力，避免模型坍缩。
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability

Published: 4 May, 2025 at 04:29 PM

82.91 👍

本文通过引入批评-修订提示和比较多任务训练、反事实训练及其结合的方法，系统评估了知识蒸馏对语言模型性能和可解释性的影响，发现多任务训练在性能上表现出色，而结合批评-修订提示的方法显著提升了可解释性。

Posts

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models

Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability