Tag: Efficiency
All the articles with the tag "Efficiency".
-
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
本文提出 StreamRL 框架,通过分离式流生成架构优化 RL 训练,解决了流水线和偏斜气泡问题,提高了 LLMs RL 训练的吞吐量和成本效率。
-
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer reimagines the KV cache as a vector storage system, using an attention-aware wave index and wave buffer to achieve up to 4.5x speedup over full attention and 10.5x over sparse baselines for long-context LLM inference, while preserving near-full-attention accuracy.
-
Toward Efficient Exploration by Large Language Model Agents
本文通过使用 LLMs 显式实现后验采样 RL 算法,显著提高了 LLMs 代理在自然语言环境中的探索效率,同时保留了经典算法的统计性能优势。
-
Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition
This paper explores effective distillation of HuBERT for ASR by comparing student model structures, introducing a discriminative loss for improved low-resource performance, and proposing front-end distillation from waveform to Fbank features, achieving 17% parameter reduction and doubled inference speed with minor performance degradation.
-
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
R&B框架通过基于语义相似性的数据重新分组和梯度驱动的动态权重调整,以极低的计算开销(0.01%)在自然语言和多模态任务中匹配或超越现有数据混合策略,提升了基础模型训练效率。