Tag: Pre-training

All the articles with the tag "Pre-training".

Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective

Published: 4 Jun, 2025 at 11:25 AM

85.01 🤔

本文通过编码-解码框架研究任务向量在上下文学习中的浮现与有效性，提出任务可解码性（TD）指标预测ICL性能，并发现微调早期层比后期层更能提升任务编码和性能。
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction

Published: 7 May, 2025 at 08:42 AM

83.92 🤔

本文通过理论分析区分了语言模型输出概率的三种解释（完成分布、响应分布、事件分布），揭示了现有研究中对这些分布的混淆和误解，并呼吁谨慎解释模型概率以指导LLM的开发和应用。
Toward Understanding In-context vs. In-weight Learning

Published: 7 May, 2025 at 12:16 AM

82.20 🤔

本文通过一个简化的理论模型和多场景实验，揭示了数据分布特性如何驱动上下文学习（ICL）和权重学习（IWL）的出现与竞争，并解释了ICL在训练过程中可能短暂的原因。
Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Published: 6 May, 2025 at 01:18 AM

81.62 🤔

本文提出DPE，一种无需训练的长文本外推方法，通过检测RoPE不同维度组的有效相对距离并识别关键维度，有选择地调整这些关键维度的位置索引，显著扩展了LLM的上下文窗口并提升了长文本任务性能。
Don't be lazy: CompleteP enables compute-efficient deep transformers

Published: 11 May, 2025 at 11:16 AM

81.10 🤔

This paper introduces CompleteP, a parameterization for transformers with α = 1, which ensures depth-wise hyperparameter transfer and complete feature learning, achieving 12-34% compute efficiency improvements and enabling a wider range of compute-optimal width-to-depth ratios.

Tag: Pre-training

Emergence and Effectiveness of Task Vectors in In-Context Learning: An Encoder Decoder Perspective

What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction

Toward Understanding In-context vs. In-weight Learning

Effective Length Extrapolation via Dimension-Wise Positional Embeddings Manipulation

Don't be lazy: CompleteP enables compute-efficient deep transformers