Tag: Fine-tuning

All the articles with the tag "Fine-tuning".

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Published: 17 May, 2025 at 11:02 AM

91.74 🤔

This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.
Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation

Published: 3 Jun, 2025 at 11:27 AM

91.67 🤔

本文提出Mixup Model Merge (M³) 方法，通过在参数空间中随机线性插值并利用Beta分布采样贡献比例，显著提升了大语言模型合并的性能、分布外鲁棒性和对抗鲁棒性。
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Published: 28 May, 2025 at 11:22 AM

91.54 🤔

本文提出LORE-MERGING框架，通过低秩估计构建近似基础模型和任务向量，无需访问原始基础模型即可实现模型合并，并在多个基准数据集上展现出优于传统方法的性能。
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Published: 8 May, 2025 at 06:12 PM

91.54 🤔

This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
Activation-Guided Consensus Merging for Large Language Models

Published: 22 May, 2025 at 11:19 AM

90.71 🤔

本文提出Activation-Guided Consensus Merging (ACM)，通过基于激活值互信息（MI）的层级权重系数调整，实现大型语言模型在Long-to-Short推理任务中的高效合并，显著减少输出冗余并提升推理精度，尤其在小规模模型上效果明显。

Tag: Fine-tuning

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Activation-Guided Consensus Merging for Large Language Models