Posts
All the articles I've posted.
-
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
本文通过校准感知微调(CFT和RCFT)方法,结合可校准和不可校准区域的理论框架,显著改善了偏好对齐后大型语言模型的校准性能,同时维持或提升其语言能力。
-
Efficient Single-Pass Training for Multi-Turn Reasoning
本文提出了一种通过响应令牌复制和自定义注意力掩码来实现多轮推理对话单次前向传递训练的方法,显著提高了训练效率,同时维护了推理可见性和位置一致性。
-
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
R&B框架通过基于语义相似性的数据重新分组和梯度驱动的动态权重调整,以极低的计算开销(0.01%)在自然语言和多模态任务中匹配或超越现有数据混合策略,提升了基础模型训练效率。
-
ComPO: Preference Alignment via Comparison Oracles
This paper introduces ComPO, a novel preference alignment method for LLMs using comparison oracles to effectively utilize noisy preference pairs, demonstrating reduced verbosity and likelihood displacement across multiple models and benchmarks.
-
MMRL++: Parameter-Efficient and Interaction-Aware Representation Learning for Vision-Language Models
本文提出MMRL及MMRL++框架,通过共享表示空间和解耦策略增强视觉-语言模型的少样本适配能力,并利用参数高效的SRRA和PRC机制提升泛化性和训练稳定性,在多个数据集上取得最优性能。