Tag: Memory Capacity
All the articles with the tag "Memory Capacity".
-
ATLAS: Learning to Optimally Memorize the Context at Test Time
本文提出Atlas,一种高容量长期内存模块,通过滑动窗口Omega规则和Muon优化器优化上下文记忆,在语言建模和长上下文理解任务中显著优于Transformer和现代RNN。
-
MoM: Linear Sequence Modeling with Mixture-of-Memories
The Mixture-of-Memories (MoM) architecture introduces multiple independent memory states with a routing mechanism to enhance memory capacity and reduce interference in linear sequence modeling, achieving significant performance gains over other linear models on recall-intensive tasks and nearing Transformer performance at larger scales while maintaining efficiency.