Tag: Reinforcement Learning
All the articles with the tag "Reinforcement Learning".
-
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
This paper introduces VLM Q-Learning, an offline-to-online reinforcement learning method that fine-tunes Vision-Language Models for interactive decision-making by filtering suboptimal actions with a critic head, achieving significant performance improvements over supervised fine-tuning across multiple multimodal agent tasks.
-
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
This paper introduces SIMPLEMIX, a simple method to mix on- and off-policy data in language model preference optimization, demonstrating that their complementary strengths—on-policy for reasoning tasks and off-policy for open-ended tasks—lead to a 6.03% average improvement over single-source methods on Alpaca Eval 2.0.
-
SmallPlan: Leverage Small Language Models for Sequential Path Planning with Simulation-Powered, LLM-Guided Distillation
本文提出了 SmallPlan 框架,通过结合 LLM 指导的蒸馏、模拟环境反馈的 SFT 和 RL,训练轻量级的小型语言模型 (SLM) 进行高效的机器人高层路径规划,使其在资源受限的边缘设备上实现接近大型模型 (LLM) 的性能。
-
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning
本文提出EPO方法,通过强化学习优化一个专门的战略推理模型,辅助任意LLM代理在动态环境中实现长期目标对齐,提升战略推理能力。
-
Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning
本文提出Reason2Attack方法,通过基于Frame Semantics的CoT示例合成和带攻击过程奖励的强化学习,增强LLM的推理能力,以高效生成对抗性提示实现对T2I模型的越狱攻击。