Tag: Knowledge Distillation
All the articles with the tag "Knowledge Distillation".
-
Scalable Model Merging with Progressive Layer-wise Distillation
本文提出ProDistill算法,通过逐层教师-学生蒸馏高效合并大型预训练模型,理论证明领域特定数据的必要性,并在视觉、语言任务上实现显著性能提升(6.14%-6.61%),展现出优越的内存和计算效率。
-
LongReD: Mitigating Short-Text Degradation of Long-Context Large Language Models via Restoration Distillation
本文提出LongReD方法,通过长文本训练、短文本蒸馏和短到长蒸馏的多目标训练策略,有效缓解了长上下文大语言模型在短文本任务上的性能下降,同时保持或提升长文本处理能力。
-
The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation
本文提出DC-CoT基准,通过系统评估数据增强、选择和混合策略在链式思维(CoT)蒸馏中的效果,揭示数据增强(如反向思维)对小型学生模型推理能力提升的显著作用,并为高效推理模型开发提供了实践指导。
-
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
本文提出低秩克隆(LRC)方法,通过低秩投影矩阵和激活克隆实现从大型语言模型到小型语言模型的高效知识蒸馏,仅用10-20B tokens训练即可媲美或超越训练数据量达数万亿tokens的模型,显著提升训练效率。
-
Towards Complementary Knowledge Distillation for Efficient Dense Image Prediction
This paper introduces a Boundary and Context Distillation (BCD) method for efficient dense image prediction, enhancing compact models' boundary completeness and region connectivity through targeted knowledge transfer, achieving superior accuracy across multiple tasks and datasets without inference cost increase.