Tag: Efficiency

All the articles with the tag "Efficiency".

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Published: 8 May, 2025 at 06:17 PM

87.73 🤔

RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

Published: 21 May, 2025 at 11:29 AM

87.72 🤔

本文提出 Rodimus 和 Rodimus+ 模型，通过数据依赖温度选择（DDTS）和滑动窗口共享键注意力（SW-SKA）机制，在保持性能的同时显著降低大型语言模型的计算和内存复杂度，挑战了准确性与效率的权衡。
Communicating Activations Between Language Model Agents

Published: 10 May, 2025 at 10:59 AM

87.71 🤔

This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.
Merge to Mix: Mixing Datasets via Model Merging

Published: 26 May, 2025 at 11:24 AM

87.71 🤔

本文提出*Merge to Mix*方法，通过模型合并技术作为代理，高效选择数据集混合用于大型模型微调，在图像分类和语言任务中显著优于传统方法，接近甚至部分超过Oracle性能。
Scalable Model Merging with Progressive Layer-wise Distillation

Published: 4 Jun, 2025 at 11:26 AM

87.67 🤔

本文提出ProDistill算法，通过逐层教师-学生蒸馏高效合并大型预训练模型，理论证明领域特定数据的必要性，并在视觉、语言任务上实现显著性能提升（6.14%-6.61%），展现出优越的内存和计算效率。

Tag: Efficiency

RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

Communicating Activations Between Language Model Agents

Merge to Mix: Mixing Datasets via Model Merging

Scalable Model Merging with Progressive Layer-wise Distillation