Tag: Efficiency

All the articles with the tag "Efficiency".

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs

Published: 30 May, 2025 at 11:16 AM

87.61 🤔

本文提出动态采样预算分配和温度调度机制，通过基于问题难度的资源再分配和维持策略熵的探索能力，显著提升了大型语言模型在数学任务中的强化学习效率和性能，尤其在AIME 2024基准上pass@1和pass@16分别提高5.31%和3.33%。
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Published: 17 May, 2025 at 11:22 PM

87.55 🤔

本文通过在softmax注意力机制的SDPA输出后引入头特定sigmoid门控机制，显著提升了15B MoE和1.7B密集模型的性能、训练稳定性和长上下文泛化能力，同时消除了注意力沉积现象。
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Published: 22 May, 2025 at 11:22 AM

87.53 🤔

本文揭示了强化学习中低概率token过度主导模型更新的问题，并提出Advantage Reweighting和Lopti两种方法，通过平衡token更新权重显著提升GRPO训练的大语言模型性能，最高在K&K Logic Puzzle任务上提升46.2%。
SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Published: 8 May, 2025 at 06:16 PM

87.52 🤔

SEAL, a training-free method, calibrates the reasoning process of Large Language Models by steering latent representations to reduce redundant thoughts, achieving up to 14.1% accuracy improvement and 50.4% token reduction across diverse benchmarks.
Zero-Shot Vision Encoder Grafting via LLM Surrogates

Published: 2 Jun, 2025 at 11:23 AM

87.49 🤔

本文提出通过构建小型代理模型训练视觉编码器并零样本嫁接至大型LLM（如Llama-70B），在保持视觉理解能力的同时将VLM训练成本降低约45%。

Tag: Efficiency

Enhancing Efficiency and Exploration in Reinforcement Learning for LLMs

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Zero-Shot Vision Encoder Grafting via LLM Surrogates