Tag: Reinforcement Learning
All the articles with the tag "Reinforcement Learning".
-
Llama-Nemotron: Efficient Reasoning Models
NVIDIA 发布了 Llama-Nemotron 系列开放模型,通过结合神经架构搜索、知识蒸馏、持续预训练、基于高质量合成数据的多阶段有监督微调和大规模强化学习,构建了在推理能力和效率上均达到领先水平、并支持动态推理模式切换的异构模型家族。
-
SEM: Reinforcement Learning for Search-Efficient Large Language Models
本文提出 *SEM* 框架,通过强化学习优化大型语言模型的搜索行为,在减少冗余搜索的同时提升回答准确性,显著提高推理效率。
-
Large Language Models Think Too Fast To Explore Effectively
本文通过《Little Alchemy 2》游戏评估大型语言模型(LLMs)的探索能力,发现大多数LLMs因过早决策和过度依赖不确定性驱动策略而表现不如人类,但o1和DeepSeek-R1通过平衡赋能和深入推理显著超越人类,揭示了推理深度和架构设计对开放性探索的重要性。
-
Better Estimation of the KL Divergence Between Language Models
This paper introduces a Rao-Blackwellized Monte Carlo estimator for KL divergence between language models, achieving unbiased estimates with provably lower variance than standard Monte Carlo methods, and demonstrates improved stability and performance in RLHF fine-tuning for sentiment-controlled generation.
-
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards
REWARD-SQL introduces a framework for Text-to-SQL by decomposing queries into Chain-of-CTEs and using Process Reward Models (PRMs) with GRPO and Best-of-N sampling, achieving a state-of-the-art 68.9% execution accuracy on the BIRD dataset with a 7B model.