Tag: Long Context

All the articles with the tag "Long Context".

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Published: 9 May, 2025 at 11:06 AM

73.12 🤔

RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
An Empirical Study of Evaluating Long-form Question Answering

Published: 4 May, 2025 at 04:31 PM

55.78 🤔

本文实证研究了长形式问题回答的自动评估指标，证明了基于LLM的指标在准确性和稳定性上的优势，同时分析了其偏差和改进策略。
State Space Models are Strong Text Rerankers

Published: 4 May, 2025 at 04:26 PM

50.53 🤔

本文通过全面benchmark比较状态空间模型如Mamba与Transformer在文本重排序任务中的性能和效率，发现Mamba模型可实现类似性能但效率较低，并强调了未来优化方向。
RWKV-X: A Linear Complexity Hybrid Language Model

Published: 4 May, 2025 at 04:32 PM

87.80 👍

本文提出RWKV-X，一种线性复杂度的混合语言模型，通过结合RWKV和稀疏注意力机制，提升长上下文建模能力，同时保持高效性和短上下文性能。
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Published: 4 May, 2025 at 04:29 PM

85.10 👍

论文通过大规模实验分析了Transformer LLMs中稀疏注意力的效率-准确性权衡，揭示了长序列下更大稀疏模型的优势，并建立了可推广的缩放定律。

Tag: Long Context

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

An Empirical Study of Evaluating Long-form Question Answering

State Space Models are Strong Text Rerankers

RWKV-X: A Linear Complexity Hybrid Language Model

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs