Posts
All the articles I've posted.
-
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks
本文提出PLAN-AND-ACT框架,通过分离规划和执行模块、利用合成数据训练和动态重规划,提高LLM代理在复杂长期任务中的性能,并在web导航基准上达到state-of-the-art结果。
-
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
本文提出了一种奖励增强数据集方法,通过对偏好对进行重新标记使大型语言模型条件化于奖励值学习响应质量全谱,显著提升了直接偏好优化(DPO)的性能并缓解了其遗忘高质被拒响应和无差别学习低质选中响应的局限性。
-
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
本文提出了低秩知识遗忘(LoKU)框架,包含反向铰链损失(IHL)和 Fisher 加权低秩适配器初始化(FILA),以实现鲁棒且参数高效的大语言模型知识遗忘,有效移除敏感信息同时保持模型原有能力。
-
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
This paper introduces VLM Q-Learning, an offline-to-online reinforcement learning method that fine-tunes Vision-Language Models for interactive decision-making by filtering suboptimal actions with a critic head, achieving significant performance improvements over supervised fine-tuning across multiple multimodal agent tasks.
-
The dynamic interplay between in-context and in-weight learning in humans and neural networks
本文通过神经网络中上下文学习(ICL)与权重学习(IWL)的动态交互,统一解释了人类学习中的组合性泛化、课程效应及灵活性与保留性权衡,为认知科学双过程理论提供了新视角。