Tag: Fine-tuning
All the articles with the tag "Fine-tuning".
-
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models
This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
-
EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning
本文提出EMORL框架,通过集成学习分别训练单目标模型并在隐藏状态层聚合,结合分层网格搜索优化权重,在咨询反思生成任务中实现了与传统方法相当的性能,同时显著提升了训练效率、可扩展性和解释性。
-
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.
-
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.
-
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM introduces a Hessian-based PAC-Bayes framework and NTK-based scaling model for LLM selection, achieving up to 91.1% accuracy and 88.5% computational cost reduction by modeling fine-tuning dynamics across diverse tasks.