Tag: Long Context

All the articles with the tag "Long Context".

ICLR: In-Context Learning of Representations

Published: 7 May, 2025 at 08:41 AM

84.18 🤔

本文通过上下文图追踪任务揭示了大型语言模型能随上下文规模增加而突现地重组概念表示以适应新语义，并提出能量最小化假设解释这一过程。
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Published: 8 May, 2025 at 06:19 PM

83.98 🤔

This paper introduces a recursive summarization method to enhance long-term dialogue memory in LLMs, achieving marginal quantitative improvements and notable qualitative gains in consistency and coherence across multiple models and datasets.
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Published: 6 May, 2025 at 01:18 AM

81.62 🤔

本文提出DPE，一种无需训练的长文本外推方法，通过检测RoPE不同维度组的有效相对距离并识别关键维度，有选择地调整这些关键维度的位置索引，显著扩展了LLM的上下文窗口并提升了长文本任务性能。
Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing

Published: 19 May, 2025 at 11:17 AM

79.35 🤔

本文提出MegaBeam-Mistral-7B，通过渐进式训练和系统优化，使7B参数模型实现512K token长上下文处理，在多个基准测试中展现出与更大模型相当的性能，但多事实推理能力仍需改进。
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores

Published: 4 May, 2025 at 04:29 PM

77.68 🤔

本文提出MOOSComp方法，通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题，并在压缩中整合outlier分数保留关键token，显著提升了任务无关的长上下文压缩性能和泛化能力。

Tag: Long Context

ICLR: In-Context Learning of Representations

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Scaling Context, Not Parameters: Training a Compact 7B Language Model for Efficient Long-Context Processing

MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores