Tag: Reinforcement Learning

All the articles with the tag "Reinforcement Learning".

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Published: 8 May, 2025 at 10:22 AM

89.20 🤔

The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs

Published: 22 May, 2025 at 11:16 AM

89.06 🤔

本文通过理论和实验分析，揭示了当前RL（如GRPO）在LLM后训练中的MDP结构假设使其退化为过滤迭代监督微调，并指出响应长度增加源于奖励分配偏差，而非推理能力提升。
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Published: 20 May, 2025 at 11:10 AM

89.02 🤔

本文提出REFINE-AF框架，利用小型开源语言模型和基于自动化反馈的强化学习生成任务无关指令数据集，相较基线在SUPER-NI数据集上显著提升了63-66%的任务表现，同时降低了成本和人工干预。
Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

Published: 1 Jun, 2025 at 11:43 AM

88.88 🤔

本文提出动态思维模式优化框架（DTO），通过分割和优化大型推理模型的推理路径，显著减少计算开销并提升准确率，在数学推理基准上实现高达12%的准确率提升和47%的FLOPs减少。
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Published: 30 May, 2025 at 11:15 AM

88.65 🤔

R1-Searcher++ 通过两阶段训练策略（SFT 和 RL），结合奖励机制和记忆模块，使大型语言模型自适应地平衡内部知识与外部检索，在多跳问答任务中显著提升准确性和检索效率。

Tag: Reinforcement Learning

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs

REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning