Tag: Large Language Model

All the articles with the tag "Large Language Model".

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

Published: 28 May, 2025 at 11:23 AM

89.33 🤔

本文系统研究了CoT蒸馏中教师模型选择、粒度和格式对小型语言模型（SLMs）推理能力的影响，发现强模型受益于高粒度CoT而弱模型偏好中等粒度，格式影响有限，且教师模型能力并非决定学生表现的唯一因素。
UFT: Unifying Supervised and Reinforcement Fine-Tuning

Published: 25 May, 2025 at 11:47 AM

89.30 🤔

本文提出统一微调（UFT）框架，通过整合监督微调和强化微调，利用提示引导探索和混合目标函数，在不同规模模型和推理任务上均表现出色，并理论上证明了样本复杂度的指数级改进。
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Published: 23 May, 2025 at 11:13 AM

89.28 🤔

本文提出MEAP训练范式，通过在下一词预测中引入随机掩码策略，显著提升大型语言模型在关键信息检索和长上下文推理任务中的性能，同时保持计算效率和架构兼容性。
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Published: 26 May, 2025 at 11:24 AM

89.27 🤔

本文通过实验和理论分析揭示了RLVR提升大型语言模型准确性但不提升能力的原因在于其偏向优化简单问题，而蒸馏只有在引入新知识时才能提升能力，否则表现与RLVR类似。
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Published: 4 Jun, 2025 at 12:00 PM

89.25 🤔

本文提出了一种低秩引导的稀疏微调方法LIFT，通过低秩近似后选择主要权重进行微调，在推理任务上显著优于全参数微调和LoRA等方法，同时保持内存效率。

Tag: Large Language Model

Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning