Posts

All the articles I've posted.

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Published: 23 May, 2025 at 11:14 AM

91.73 🤔

Token Recycling 提出了一种无训练的推测解码方法，通过回收候选词并利用邻接矩阵构建草稿树，实现大型语言模型推理约 2 倍加速，相较于其他无训练方法提升超 30%。
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation

Published: 3 Jun, 2025 at 11:27 AM

91.67 🤔

本文提出Mixup Model Merge (M³) 方法，通过在参数空间中随机线性插值并利用Beta分布采样贡献比例，显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Published: 28 May, 2025 at 11:22 AM

91.54 🤔

本文提出LORE-MERGING框架，通过低秩估计构建近似基础模型和任务向量，无需访问原始基础模型即可实现模型合并，并在多个基准数据集上展现出优于传统方法的性能。
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Published: 8 May, 2025 at 06:12 PM

91.54 🤔

This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Published: 3 Jun, 2025 at 11:45 AM

91.52 🤔

本文提出ProRL方法，通过长时间强化学习结合KL散度惩罚和参考策略重置，在多样化任务上训练Nemotron-Research-Reasoning-Qwen-1.5B模型，显著扩展了大型语言模型的推理边界，尤其在基础模型表现较差的领域和分布外任务上表现出色。

Posts

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models