Tag: Efficiency
All the articles with the tag "Efficiency".
-
Lost in Transmission: When and Why LLMs Fail to Reason Globally
本文提出BAPO模型量化大型语言模型(LLMs)内部通信带宽限制,理论证明与实验验证了LLMs在高带宽需求任务上的失败,并展示链式思维(CoT)可降低带宽需求以缓解部分问题。
-
Chain-of-Model Learning for Language Model
本文提出 Chain-of-Model (CoM) 学习范式,通过在 Transformer 架构中引入因果依赖的多尺度表示(Chain-of-Representation),实现高效模型扩展和弹性推理,实验表明 CoLM 系列在性能上与标准 Transformer 相当,同时在预填充速度和灵活性上具有优势。
-
Core Context Aware Transformers for Long Context Language Modeling
本文提出了一种核心上下文感知注意力机制(CCA-Attention),通过全局感知池化和局部保持模块减少长上下文建模中的冗余信息,在保持性能的同时显著提升计算效率,实验表明在 128K 上下文下实现了 7.9 倍加速和约 45% 内存减少。
-
Task-Core Memory Management and Consolidation for Long-term Continual Learning
This paper introduces Long-CL, a human memory-inspired framework for long-term continual learning, leveraging task-core memory management and selective sample consolidation to significantly outperform baselines by 7.4% and 6.5% AP on two novel benchmarks, MMLongCL-Bench and TextLongCL-Bench, while mitigating catastrophic forgetting.
-
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
本文提出Compressed Latent Reasoning (CoLaR)框架,通过潜在空间动态压缩和强化学习优化大型语言模型的推理过程,在数学推理任务中显著提升效率并保持较高准确率。