Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

Thinking Out Loud: Do Reasoning Models Know When They're Right?

Published: 25 May, 2025 at 11:51 AM

90.51 🤔

本文通过对比指令微调、监督微调和强化学习训练的大型推理模型，发现推理导向训练显著提升了推理任务中的准确性和校准能力，但在事实性任务中可能削弱小规模模型对知识边界的感知。
Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets

Published: 15 May, 2025 at 11:06 AM

90.25 🤔

This paper systematically evaluates machine learning classifiers and imbalance learning techniques on two cybersecurity datasets, revealing that XGB and RF perform robustly, while sampling and ensembling effects vary, emphasizing the need for dataset-specific method selection.
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

Published: 28 May, 2025 at 11:20 AM

87.20 🤔

本文提出RaML框架，从元学习视角将LLM推理轨迹视为伪梯度更新，通过理论分析和实验验证了推理与优化的关联，并探索了训练策略和轨迹特性对推理能力的提升潜力。
UFT: Unifying Supervised and Reinforcement Fine-Tuning

Published: 25 May, 2025 at 11:47 AM

89.30 🤔

本文提出统一微调（UFT）框架，通过整合监督微调和强化微调，利用提示引导探索和混合目标函数，在不同规模模型和推理任务上均表现出色，并理论上证明了样本复杂度的指数级改进。
Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Published: 26 May, 2025 at 11:24 AM

89.27 🤔

本文通过实验和理论分析揭示了RLVR提升大型语言模型准确性但不提升能力的原因在于其偏向优化简单问题，而蒸馏只有在引入新知识时才能提升能力，否则表现与RLVR类似。

Tag: Supervised Learning

Thinking Out Loud: Do Reasoning Models Know When They're Right?

Cyber Security Data Science: Machine Learning Methods and their Performance on Imbalanced Datasets

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning