Tag: Scaling Laws

All the articles with the tag "Scaling Laws".

Don't be lazy: CompleteP enables compute-efficient deep transformers

Published: 11 May, 2025 at 11:16 AM

81.10 🤔

This paper introduces CompleteP, a parameterization for transformers with α = 1, which ensures depth-wise hyperparameter transfer and complete feature learning, achieving 12-34% compute efficiency improvements and enabling a wider range of compute-optimal width-to-depth ratios.
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?

Published: 12 May, 2025 at 11:20 AM

76.16 🤔

This paper introduces a framework to classify algorithmic innovations in LLMs as compute-dependent or compute-independent, demonstrating through small-scale GPT-2 experiments that compute-independent advancements like FlashAttention can yield up to 3.5× compute-equivalent gains even under hardware constraints, challenging the efficacy of hardware-focused AI regulation.
Contextures: Representations from Contexts

Published: 10 May, 2025 at 11:05 AM

69.00 🤔

This paper introduces the contexture theory, unifying representation learning across paradigms by targeting top singular functions of a context-induced expectation operator, demonstrating high alignment in neural representations and proposing a task-agnostic metric for context evaluation with strong empirical correlation to performance on various datasets.
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

Published: 6 May, 2025 at 11:19 PM

90.65 😐

本文通过提出一个四维度分类框架（什么扩展、如何扩展、哪里扩展、扩展效果如何），系统综述了测试时扩展（TTS）在大型语言模型中的研究现状，为理解和应用推理阶段计算扩展提供了结构化视角和实践指导。
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Published: 4 May, 2025 at 04:28 PM

86.30 😐

本文提出 EAGLE-3 方法，通过移除特征预测约束和多层特征融合技术，显著提高了大语言模型的推理加速比，并在实验中实现了高达 6.5 倍的无损速度提升。

Tag: Scaling Laws

Don't be lazy: CompleteP enables compute-efficient deep transformers

LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?

Contextures: Representations from Contexts

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test