Tag: Reinforcement Learning

All the articles with the tag "Reinforcement Learning".

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

Published: 6 May, 2025 at 11:15 PM

83.31 🤔

本文提出InteRank方法，通过知识蒸馏和强化学习训练一个3B参数小型语言模型，在推理密集型文档重排序任务中生成解释并实现与70B+参数模型相当的性能，在BRIGHT基准上位列第三。
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Published: 7 May, 2025 at 08:43 AM

82.56 🤔

本文提出R1-Reward，通过StableReinforce算法将强化学习应用于多模态奖励模型训练，显著提升了性能并在多个基准测试中超越现有最优模型，同时展示了优异的数据效率和测试时扩展性。
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation

Published: 4 May, 2025 at 04:29 PM

79.93 🤔

本文提出 StreamRL 框架，通过分离式流生成架构优化 RL 训练，解决了流水线和偏斜气泡问题，提高了 LLMs RL 训练的吞吐量和成本效率。
Toward Efficient Exploration by Large Language Model Agents

Published: 4 May, 2025 at 04:31 PM

79.45 🤔

本文通过使用 LLMs 显式实现后验采样 RL 算法，显著提高了 LLMs 代理在自然语言环境中的探索效率，同时保留了经典算法的统计性能优势。
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models

Published: 7 May, 2025 at 08:42 AM

78.41 🤔

本文系统综述了基于强化学习的推理方法在多模态大语言模型（MLLMs）中的进展，分析了算法设计、奖励机制及应用，揭示了跨模态推理和奖励稀疏性等挑战，并提出了分层奖励和交互式RL等未来方向。

Tag: Reinforcement Learning

Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation

Toward Efficient Exploration by Large Language Model Agents

Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models