Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Published: 3 Jun, 2025 at 11:42 AM

85.21 🤔

本文通过ComPABench基准评估视觉-语言模型（VLMs）的组合推理能力，发现强化学习（RL）优于监督微调（SFT）在跨任务和分布外泛化中的表现，并提出RL-Ground方法显著提升多模态组合推理性能。
Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

Published: 4 Jun, 2025 at 11:59 AM

85.17 🤔

本文通过理论和实验分析，提出模型集成方法通过平衡‘bias-variance’权衡有效缓解监督微调中的过适应问题，提升下游任务性能并减少预训练知识遗忘。
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Published: 1 Jun, 2025 at 11:45 AM

85.12 🤔

本文提出残差对齐模型（RAM），通过重要性采样分离对齐模块，实现高效的序列级训练和令牌级解码，在多个对齐任务中显著提升性能并降低资源成本。
Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data

Published: 19 May, 2025 at 11:18 AM

81.58 🤔

本文提出判别式微调（DFT）框架，通过判别式概率模型优化大型语言模型的输出概率，无需人类偏好数据或奖励模型，在数学推理和通用语言任务上显著优于SFT并与SFT→PO方法相当。
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models

Published: 4 May, 2025 at 04:26 PM

76.52 🤔

本文首次系统调查了大型语言模型高效推理的进展，通过分类模型、输出和提示-based方法，探讨了减少"过度思考"现象的策略，以优化计算效率并保持推理能力。

Tag: Supervised Learning

Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model

Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data

Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models