Tag: Fine-tuning

All the articles with the tag "Fine-tuning".

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Published: 17 May, 2025 at 11:02 AM

91.74 🤔

This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Published: 28 May, 2025 at 11:22 AM

91.54 🤔

本文提出LORE-MERGING框架，通过低秩估计构建近似基础模型和任务向量，无需访问原始基础模型即可实现模型合并，并在多个基准数据集上展现出优于传统方法的性能。
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Published: 8 May, 2025 at 06:12 PM

91.54 🤔

This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
Incentivizing Strong Reasoning from Weak Supervision

Published: 30 May, 2025 at 11:19 AM

87.07 🤔

本文提出弱到强推理（W2SR）范式，通过显著较弱教师模型生成的结构化链式思维轨迹对强学生模型进行监督微调，以低成本方式显著提升其推理能力，接近甚至超越昂贵的强化学习效果。
Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training

Published: 30 May, 2025 at 11:15 AM

86.28 🤔

本文通过微调GPT-4o和GPT-4o-mini，展示了大型语言模型能够量化报告其内部决策过程（如属性权重），并通过内省训练显著提升报告准确性，且这种能力可泛化至原生偏好，为AI可解释性和安全性提供了新路径。

Tag: Fine-tuning

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Incentivizing Strong Reasoning from Weak Supervision

Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training