Tag: Long Context
All the articles with the tag "Long Context".
-
ICLR: In-Context Learning of Representations
本文通过上下文图追踪任务揭示了大型语言模型能随上下文规模增加而突现地重组概念表示以适应新语义,并提出能量最小化假设解释这一过程。
-
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
This paper introduces a recursive summarization method to enhance long-term dialogue memory in LLMs, achieving marginal quantitative improvements and notable qualitative gains in consistency and coherence across multiple models and datasets.
-
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation
本文提出DPE,一种无需训练的长文本外推方法,通过检测RoPE不同维度组的有效相对距离并识别关键维度,有选择地调整这些关键维度的位置索引,显著扩展了LLM的上下文窗口并提升了长文本任务性能。
-
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing
本文提出MegaBeam-Mistral-7B,通过渐进式训练和系统优化,使7B参数模型实现512K token长上下文处理,在多个基准测试中展现出与更大模型相当的性能,但多事实推理能力仍需改进。
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。