Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Published: 25 May, 2025 at 11:47 AM

89.30 🤔

本文提出统一微调（UFT）框架，通过整合监督微调和强化微调，利用提示引导探索和混合目标函数，在不同规模模型和推理任务上均表现出色，并理论上证明了样本复杂度的指数级改进。
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Published: 26 May, 2025 at 11:24 AM

89.27 🤔

本文通过实验和理论分析揭示了RLVR提升大型语言模型准确性但不提升能力的原因在于其偏向优化简单问题，而蒸馏只有在引入新知识时才能提升能力，否则表现与RLVR类似。
LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Published: 4 Jun, 2025 at 12:00 PM

89.25 🤔

本文提出了一种低秩引导的稀疏微调方法LIFT，通过低秩近似后选择主要权重进行微调，在推理任务上显著优于全参数微调和LoRA等方法，同时保持内存效率。
Always Skip Attention

Published: 8 May, 2025 at 11:06 AM

89.20 🤔

This paper theoretically demonstrates the ill-conditioning of Self-Attention Blocks in Vision Transformers without skip connections, highlights their role as regularizers, and proposes Token Graying (SVD and DCT) to improve input token conditioning, achieving modest performance gains in supervised and self-supervised tasks.
SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models

Published: 28 May, 2025 at 11:24 AM

89.09 🤔

SLearnLLM提出了一种自学习框架，通过让大语言模型自我评估并筛选错误回答的QA对进行微调，在农业和医疗领域实现了与全数据集微调相当的性能提升，同时显著降低了训练时间成本。

Tag: Supervised Learning

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

LIFT the Veil for the Truth: Principal Weights Emerge after Rank Reduction for Reasoning-Focused Supervised Fine-Tuning

Always Skip Attention

SLearnLLM: A Self-Learning Framework for Efficient Domain-Specific Adaptation of Large Language Models