Tag: Efficiency

All the articles with the tag "Efficiency".

Investigating Task Arithmetic for Zero-Shot Information Retrieval

Published: 7 May, 2025 at 08:43 AM

86.02 🤔

本文提出任务算术方法，通过参数加减操作实现零样本信息检索的领域和语言适应，在科学、生物医学和多语言数据集上取得最高18%的NDCG@10提升，展现了轻量级模型适应的潜力。
Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant

Published: 11 May, 2025 at 11:12 AM

85.28 🤔

This paper comprehensively evaluates the impact of four quantization methods (GPTQ, AWQ, SmoothQuant, FP8) on instruction-tuned LLMs and SLMs from 1B to 405B parameters across 13 datasets, revealing that quantized models often outperform smaller baselines but struggle with instruction-following and hallucination detection, with FP8 showing robustness and task difficulty not always correlating with accuracy loss.
MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning

Published: 6 May, 2025 at 11:13 PM

85.27 🤔

本文提出 MateICL 框架，通过分割上下文窗口并引入注意力校准层解决大型语言模型在大规模上下文学习中的注意力分散问题，实验证明其在多种 NLP 任务中有效提升性能并保持稳定性。
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Published: 7 May, 2025 at 08:43 AM

82.56 🤔

本文提出R1-Reward，通过StableReinforce算法将强化学习应用于多模态奖励模型训练，显著提升了性能并在多个基准测试中超越现有最优模型，同时展示了优异的数据效率和测试时扩展性。
Don't be lazy: CompleteP enables compute-efficient deep transformers

Published: 11 May, 2025 at 11:16 AM

81.10 🤔

This paper introduces CompleteP, a parameterization for transformers with α = 1, which ensures depth-wise hyperparameter transfer and complete feature learning, achieving 12-34% compute efficiency improvements and enabling a wider range of compute-optimal width-to-depth ratios.

Tag: Efficiency

Investigating Task Arithmetic for Zero-Shot Information Retrieval

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant

MateICL: Mitigating Attention Dispersion in Large-Scale In-Context Learning

R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning

Don't be lazy: CompleteP enables compute-efficient deep transformers