Tag: Scaling Laws
All the articles with the tag "Scaling Laws".
-
Explaining Context Length Scaling and Bounds for Language Models
本文从内在空间视角提出理论框架,解释上下文长度对语言模型损失的影响,推导出与数据集大小相关的最优上下文长度,并通过自然语言和合成数据实验验证假设。
-
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
LENSLLM introduces a Hessian-based PAC-Bayes framework and NTK-based scaling model for LLM selection, achieving up to 91.1% accuracy and 88.5% computational cost reduction by modeling fine-tuning dynamics across diverse tasks.
-
Superposition Yields Robust Neural Scaling
本文通过玩具模型和实际LLMs分析,揭示了超位置作为神经扩展律的重要机制,在强超位置下损失与模型维度成反比,与特征频率分布无关,从而解释了损失随模型规模幂律下降的现象。
-
Vectors from Larger Language Models Predict Human Reading Time and fMRI Data More Poorly when Dimensionality Expansion is Controlled
本文通过控制维度扩展发现,大型语言模型(LLMs)在预测人类阅读时间和脑成像数据时,随着模型规模增加,训练过程的贡献反而减少,揭示了模型与人类句子处理机制的潜在错位。
-
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
This paper introduces a taxonomy of language model memorization into recitation, reconstruction, and recollection, demonstrating through experiments with Pythia models that different factors influence each category, with a taxonomy-based predictive model outperforming baselines in predicting memorization likelihood.