Posts
All the articles I've posted.
-   Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token RecyclingToken Recycling 提出了一种无训练的推测解码方法,通过回收候选词并利用邻接矩阵构建草稿树,实现大型语言模型推理约 2 倍加速,相较于其他无训练方法提升超 30%。 
-   Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation本文提出Mixup Model Merge (M³) 方法,通过在参数空间中随机线性插值并利用Beta分布采样贡献比例,显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。 
-   LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging本文提出LORE-MERGING框架,通过低秩估计构建近似基础模型和任务向量,无需访问原始基础模型即可实现模型合并,并在多个基准数据集上展现出优于传统方法的性能。 
-   Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language ModelsThis paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks. 
-   ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models本文提出ProRL方法,通过长时间强化学习结合KL散度惩罚和参考策略重置,在多样化任务上训练Nemotron-Research-Reasoning-Qwen-1.5B模型,显著扩展了大型语言模型的推理边界,尤其在基础模型表现较差的领域和分布外任务上表现出色。