Tag: Unlabeled Data

All the articles with the tag "Unlabeled Data".

TTRL: Test-Time Reinforcement Learning

Published: 4 May, 2025 at 04:30 PM

93.49 👍

本文提出测试时强化学习（TTRL）方法，通过多数投票估计奖励，在无标签测试数据上训练大语言模型，实现模型自演化并显著提升推理任务性能。