Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

Published: 30 May, 2025 at 11:13 AM

95.59 🤔

本文通过仅使用920个蒸馏样本对Qwen2.5-32B基础模型进行监督微调，显著超越了资源密集的Zero-RL方法，并揭示了蒸馏模型通过拟人化语言和高级认知行为实现更灵活推理的机制。
ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback

Published: 15 May, 2025 at 11:07 AM

95.02 🤔

This paper introduces ASURA-FDPS-ML, a framework that accelerates high-resolution galaxy simulations by using a machine learning surrogate model for supernova feedback in dense regions, achieving a fourfold speedup while maintaining comparable morphological and outflow characteristics to direct simulations, despite some discrepancies in momentum at higher altitudes.
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Published: 22 May, 2025 at 11:17 AM

93.37 🤔

本文提出了一种两阶段训练框架，通过领域无关的Knights & Knaves逻辑游戏预热激活通用推理能力，并结合少量目标领域数据的RLVR训练，在资源受限环境下显著提升大型语言模型的推理性能和跨领域泛化能力。
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

Published: 28 May, 2025 at 11:25 AM

92.52 🤔

本文通过理论分析和Re-distillation技术，揭示了小规模SFT在R1风格RL中的效率瓶颈，并以极少样本（<1K）在K&K和MATH数据集上接近RL性能，显著提升了数据效率。
Temporal Sampling for Forgotten Reasoning in LLMs

Published: 28 May, 2025 at 11:20 AM

92.01 🤔

本文揭示了大型语言模型微调中的'Temporal Forgetting'现象，并提出'Temporal Sampling'方法，通过从多个训练检查点采样答案显著提升推理性能（Pass@k提升4-19个百分点），并通过LoRA适配降低存储成本。

Tag: Supervised Learning

Why Distillation can Outperform Zero-RL: The Role of Flexible Reasoning

ASURA-FDPS-ML: Star-by-star Galaxy Simulations Accelerated by Surrogate Modeling for Supernova Feedback

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

Temporal Sampling for Forgotten Reasoning in LLMs