Tag: Long Context

All the articles with the tag "Long Context".

ATLAS: Learning to Optimally Memorize the Context at Test Time

Published: 31 May, 2025 at 11:22 AM

86.98 🤔

本文提出Atlas，一种高容量长期内存模块，通过滑动窗口Omega规则和Muon优化器优化上下文记忆，在语言建模和长上下文理解任务中显著优于Transformer和现代RNN。
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

Published: 25 May, 2025 at 11:24 AM

86.84 🤔

本文通过对比实验揭示，尽管长序列模型（如Mamba2）理论上支持无限长上下文，但在实际长上下文任务中与Transformer模型一样面临显著局限，尤其在信息位置和数据格式变化时表现不佳，亟需进一步研究其原因。
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Published: 8 May, 2025 at 06:13 PM

86.84 🤔

This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.
Putting It All into Context: Simplifying Agents with LCLMs

Published: 19 May, 2025 at 11:19 AM

86.55 🤔

本文提出基于长上下文语言模型（LCLM）的‘state-in-context’代理设计，通过将整个环境状态纳入上下文简化软件工程任务的代理架构，在SWE-bench Verified上实现与复杂脚手架方法相当的性能（Gemini-2.5-Pro达到50.8% pass@1）。
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

Published: 26 May, 2025 at 11:22 AM

86.44 🤔

本文通过KVFundaBench基准系统评估KV缓存压缩对大型语言模型基本能力的影响，揭示任务依赖性性能降解，并提出ShotKV方法，通过区分预填充和解码阶段压缩策略，在长上下文生成任务上显著提升性能。

Tag: Long Context

ATLAS: Learning to Optimally Memorize the Context at Test Time

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Putting It All into Context: Simplifying Agents with LCLMs

Can LLMs Maintain Fundamental Abilities under KV Cache Compression?