Tag: Reward Alignment

All the articles with the tag "Reward Alignment".

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

Published: 21 May, 2025 at 11:24 AM

88.16 🤔

SoLoPO通过将长上下文偏好优化分解为短上下文优化和短到长奖励对齐，显著提升了大型语言模型在长上下文任务中的性能和训练效率，同时保持短上下文能力。