Tag: Sparse Attention

All the articles with the tag "Sparse Attention".

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Published: 4 May, 2025 at 04:33 PM

69.21 🤔

本文提出Mixture of Sparse Attention (MoSA)方法，通过专家选择路由实现基于内容的稀疏注意力，显著提高了Transformer模型在相同计算预算下的语言建模性能，并优化了资源使用。
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Published: 4 May, 2025 at 04:28 PM

59.39 🤔

本研究提出 SpargeAttn，一种通用稀疏注意力机制，通过两阶段在线过滤器和量化技术加速各种模型的推理，同时保持端到端性能无损。