Tag: Linear Attention

All the articles with the tag "Linear Attention".

LoLA: Low-Rank Linear Attention With Sparse Caching

Published: 1 Jun, 2025 at 11:40 AM

88.31 🤔

LoLA通过结合线性注意力、滑动窗口和稀疏缓存三种内存形式，在推理时有效缓解记忆冲突，显著提升线性注意力模型在长上下文关联回忆和语言建模任务上的性能，同时保持高效内存使用。