Tag: Long Context
All the articles with the tag "Long Context".
-
LIFEBench: Evaluating Length Instruction Following in Large Language Models
本文通过引入LIFEBENCH基准,系统评估了26个大型语言模型在长度指令遵循上的能力,发现其在长长度约束下普遍表现不佳,且远未达到厂商宣称的最大输出长度,揭示了模型在长度感知和长文本生成上的根本局限性。
-
SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
SoLoPO通过将长上下文偏好优化分解为短上下文优化和短到长奖励对齐,显著提升了大型语言模型在长上下文任务中的性能和训练效率,同时保持短上下文能力。
-
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
本文通过在softmax注意力机制的SDPA输出后引入头特定sigmoid门控机制,显著提升了15B MoE和1.7B密集模型的性能、训练稳定性和长上下文泛化能力,同时消除了注意力沉积现象。
-
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
本文通过对比实验揭示,尽管长序列模型(如Mamba2)理论上支持无限长上下文,但在实际长上下文任务中与Transformer模型一样面临显著局限,尤其在信息位置和数据格式变化时表现不佳,亟需进一步研究其原因。
-
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.