Tag: Alignment

All the articles with the tag "Alignment".

Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning

Published: 5 Jun, 2025 at 11:24 AM

91.44 🤔

本文提出双向对齐（BiAlign）方法，通过对齐学生模型与教师模型的令牌级输出分布和输入偏好，显著提升了学生模型的上下文学习能力，并在多种任务上取得了优于基线的结果。
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?

Published: 2 Jun, 2025 at 11:32 AM

86.23 🤔

本文提出并验证了'浅层偏好信号'现象，通过截断偏好数据集（保留前40%-50% token）训练奖励模型和DPO模型，性能与完整数据集相当甚至更优，并揭示了当前对齐方法过于关注早期token的局限性。
Reverse Preference Optimization for Complex Instruction Following

Published: 1 Jun, 2025 at 11:44 AM

85.20 🤔

本文提出逆向偏好优化（RPO）方法，通过动态反转指令中未满足的约束消除偏好对噪声，在多轮复杂指令跟随任务上显著优于DPO基线，并在70B模型上超越GPT-4o。
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

Published: 1 Jun, 2025 at 11:45 AM

85.12 🤔

本文提出残差对齐模型（RAM），通过重要性采样分离对齐模块，实现高效的序列级训练和令牌级解码，在多个对齐任务中显著提升性能并降低资源成本。
A Statistical Case Against Empirical Human-AI Alignment

Published: 15 May, 2025 at 11:06 AM

98.37 🤔

This position paper argues against forward empirical human-AI alignment due to statistical biases and anthropocentric limitations, advocating for prescriptive and backward alignment approaches to ensure transparency and minimize bias, supported by a case study on language model decoding strategies.

Tag: Alignment

Beyond Output Matching: Bidirectional Alignment for Enhanced In-Context Learning

Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?

Reverse Preference Optimization for Complex Instruction Following

Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models

A Statistical Case Against Empirical Human-AI Alignment