Tag: Efficiency

All the articles with the tag "Efficiency".

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Published: 17 May, 2025 at 11:19 PM

85.58 🤔

本文提出BAPO模型量化大型语言模型（LLMs）内部通信带宽限制，理论证明与实验验证了LLMs在高带宽需求任务上的失败，并展示链式思维（CoT）可降低带宽需求以缓解部分问题。
Chain-of-Model Learning for Language Model

Published: 25 May, 2025 at 11:24 AM

85.56 🤔

本文提出 Chain-of-Model (CoM) 学习范式，通过在 Transformer 架构中引入因果依赖的多尺度表示（Chain-of-Representation），实现高效模型扩展和弹性推理，实验表明 CoLM 系列在性能上与标准 Transformer 相当，同时在预填充速度和灵活性上具有优势。
Core Context Aware Transformers for Long Context Language Modeling

Published: 30 May, 2025 at 11:22 AM

85.55 🤔

本文提出了一种核心上下文感知注意力机制（CCA-Attention），通过全局感知池化和局部保持模块减少长上下文建模中的冗余信息，在保持性能的同时显著提升计算效率，实验表明在 128K 上下文下实现了 7.9 倍加速和约 45% 内存减少。
Task-Core Memory Management and Consolidation for Long-term Continual Learning

Published: 17 May, 2025 at 11:01 AM

85.53 🤔

This paper introduces Long-CL, a human memory-inspired framework for long-term continual learning, leveraging task-core memory management and selective sample consolidation to significantly outperform baselines by 7.4% and 6.5% AP on two novel benchmarks, MMLongCL-Bench and TextLongCL-Bench, while mitigating catastrophic forgetting.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Published: 28 May, 2025 at 11:22 AM

85.52 🤔

本文提出Compressed Latent Reasoning (CoLaR)框架，通过潜在空间动态压缩和强化学习优化大型语言模型的推理过程，在数学推理任务中显著提升效率并保持较高准确率。

Tag: Efficiency

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Chain-of-Model Learning for Language Model

Core Context Aware Transformers for Long Context Language Modeling

Task-Core Memory Management and Consolidation for Long-term Continual Learning

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains