Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Published: 17 May, 2025 at 11:04 AM

88.64 🤔

This paper introduces Temperature Scaling (TS) and Trace Length Control for Dynamic Reasoning (TLDR) to enhance token efficiency in small language models, achieving up to 50% reduction in response length with minimal accuracy loss across multiple reasoning benchmarks.
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Published: 24 May, 2025 at 11:08 AM

88.00 🤔

本文通过MathIF基准测试评估大型推理模型在数学任务中的指令遵循能力，揭示了推理能力提升与指令遵循能力下降之间的权衡关系，并通过实验验证了训练策略和推理链长度对这一权衡的影响。
Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs

Published: 21 May, 2025 at 11:14 AM

87.75 🤔

本文提出了一种动态自适应的混合训练框架 SASR，通过基于梯度范数和 KL 散度的动态调整机制结合 SFT 和 RL，在数学推理和逻辑推理任务上显著提升了大语言模型的性能，优于传统 SFT、RL 和静态混合方法。
Sparse-Group Boosting with Balanced Selection Frequencies: A Simulation-Based Approach and R Implementation

Published: 8 May, 2025 at 10:25 AM

87.75 🤔

This paper introduces sparse-group boosting and a simulation-based group balancing algorithm within the 'sgboost' R package to mitigate variable selection bias in high-dimensional grouped data, demonstrating improved fairness and interpretability through simulations and ecological data analysis.
Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

Published: 20 May, 2025 at 11:11 AM

87.75 🤔

本文通过RL和SFT训练不同规模LLMs，发现RL在较大模型中促进显式ToM推理但在小模型中导致推理崩溃，而SFT意外取得高性能，揭示当前ToM基准测试可能无需显式人类式推理即可解决。

Tag: Supervised Learning

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs

Sparse-Group Boosting with Balanced Selection Frequencies: A Simulation-Based Approach and R Implementation

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?