Tag: Supervised Learning

All the articles with the tag "Supervised Learning".

Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Published: 6 May, 2025 at 11:18 PM

87.33 🤔

本文通过结合监督微调（SFT）、强化学习（RL）及细粒度奖励函数（如QATCH），显著提升了小型LLM在Text2SQL任务中的推理能力和性能，Think2SQL-7B模型在BIRD数据集上超越了400B+参数模型。
Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

Published: 28 May, 2025 at 11:20 AM

87.20 🤔

本文提出RaML框架，从元学习视角将LLM推理轨迹视为伪梯度更新，通过理论分析和实验验证了推理与优化的关联，并探索了训练策略和轨迹特性对推理能力的提升潜力。
Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning

Published: 14 May, 2025 at 11:08 AM

86.86 🤔

The Boltzmann Classifier introduces a thermodynamically inspired supervised learning approach that uses an energy-based model derived from the Boltzmann distribution to estimate class probabilities, achieving competitive accuracy on benchmark datasets while offering interpretability and computational efficiency.
Sparsity May Be All You Need: Sparse Random Parameter Adaptation

Published: 25 May, 2025 at 11:51 AM

85.87 🤔

本文提出SpaRTA方法，通过随机选择一小部分预训练模型参数进行微调，实现参数高效性，并在自然语言理解任务上展现出与LoRA相当的性能和显著的内存节省。
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models

Published: 7 May, 2025 at 08:42 AM

85.65 🤔

本文综述了DeepSeek-R1发布后100天内推理语言模型的复制研究，系统总结了监督微调和基于可验证奖励的强化学习方法在数据构建和算法设计上的进展，并探讨了推理能力提升的多方向应用。

Tag: Supervised Learning

Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective

Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning

Sparsity May Be All You Need: Sparse Random Parameter Adaptation

100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models