Tag: Reinforcement Learning

All the articles with the tag "Reinforcement Learning".

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Published: 18 May, 2025 at 11:22 AM

86.89 🤔

本文通过将GRPO算法应用于Qwen2-Audio-7B-Instruct模型，在音频问答任务中取得了64.5%的最佳准确率，证明强化学习在小规模数据集上优于监督微调，但显式推理过程未显著提升性能，且与人类水平仍有差距。
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Published: 17 May, 2025 at 11:02 AM

86.87 🤔

This paper introduces a systematic approach to enhance large reasoning models by aligning them with deduction, induction, and abduction meta-abilities through a three-stage pipeline of individual training, parameter merging, and domain-specific RL, achieving up to 4% performance gains over instruction-tuned baselines across math, coding, and science benchmarks.
Can Large Reasoning Models Self-Train?

Published: 1 Jun, 2025 at 11:43 AM

86.73 🤔

本文提出Self-Rewarded Training (SRT) 方法，通过模型自一致性驱动强化学习实现无监督数学推理能力提升，初期性能媲美有监督方法，但因奖励黑客问题导致长期训练性能崩溃，并探索了提前停止和课程学习等缓解策略。
Hybrid Latent Reasoning via Reinforcement Learning

Published: 3 Jun, 2025 at 11:43 AM

86.71 🤔

本文提出HRPO，一种基于强化学习的混合潜在推理框架，通过门控机制结合离散token和连续隐状态，显著提升了大型语言模型在知识和推理任务上的性能，同时减少了对链式思维数据的依赖。
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

Published: 30 May, 2025 at 11:13 AM

86.69 🤔

本文提出 ConciseR，一种两阶段强化学习框架，通过 GRPO++ 提升推理能力并通过 L-GRPO 优化响应长度，在保持准确性的同时显著减少 CoT 响应长度，优于多个基准数据集上的现有方法。

Tag: Reinforcement Learning

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Can Large Reasoning Models Self-Train?

Hybrid Latent Reasoning via Reinforcement Learning

Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning