Posts
All the articles I've posted.
-
The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason
本文研究了大语言模型在强化学习后训练中对奖励噪声的鲁棒性,提出推理模式奖励(RPR)策略,通过奖励关键推理短语而非答案正确性显著提升性能,并用RPR校准噪声奖励模型,改善开放式任务表现。
-
Chain-of-Model Learning for Language Model
本文提出 Chain-of-Model (CoM) 学习范式,通过在 Transformer 架构中引入因果依赖的多尺度表示(Chain-of-Representation),实现高效模型扩展和弹性推理,实验表明 CoLM 系列在性能上与标准 Transformer 相当,同时在预填充速度和灵活性上具有优势。
-
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
本文提出了一种'Ensemble'提示框架,通过描述上下文示例选择标准提升大型语言模型在上下文学习中的性能,实验表明模型对提示格式的敏感性远高于描述内容本身,尤其在小型模型上效果显著。
-
Core Context Aware Transformers for Long Context Language Modeling
本文提出了一种核心上下文感知注意力机制(CCA-Attention),通过全局感知池化和局部保持模块减少长上下文建模中的冗余信息,在保持性能的同时显著提升计算效率,实验表明在 128K 上下文下实现了 7.9 倍加速和约 45% 内存减少。
-
Task-Core Memory Management and Consolidation for Long-term Continual Learning
This paper introduces Long-CL, a human memory-inspired framework for long-term continual learning, leveraging task-core memory management and selective sample consolidation to significantly outperform baselines by 7.4% and 6.5% AP on two novel benchmarks, MMLongCL-Bench and TextLongCL-Bench, while mitigating catastrophic forgetting.