Tag: Token-Level Credit Assignment
All the articles with the tag "Token-Level Credit Assignment".
-
Learning Explainable Dense Reward Shapes via Bayesian Optimization
本文提出一种通过Bayesian Optimization学习解释性密集奖励形状的方法,以解决RLHF中奖励稀疏问题,实现token级信用分配优化,提升训练效率和性能,同时保持最优政策不变。