Tag: Efficiency
All the articles with the tag "Efficiency".
-
From System 1 to System 2: A Survey of Reasoning Large Language Models
本文综述了从基础LLMs向推理LLMs的演进,通过整合System 2技术提升AI的逐步推理能力,并在基准测试中展示了显著性能改进。
-
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
本文提出MAET方法,通过提取语言无关的能力相关权重并跨语言转移,构建多语言能力增强的大型语言模型,在数学和科学任务上以60%的计算资源实现约10%的性能提升,优于多种基线方法。
-
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
本文提出了低秩知识遗忘(LoKU)框架,包含反向铰链损失(IHL)和 Fisher 加权低秩适配器初始化(FILA),以实现鲁棒且参数高效的大语言模型知识遗忘,有效移除敏感信息同时保持模型原有能力。
-
HINT: Hypernetwork Approach to Training Weight Interval Regions in Continual Learning
HINT proposes a continual learning framework using interval arithmetic in embedding space with a hypernetwork to generate target network weights, achieving improved scalability and non-forgetting guarantees over InterContiNet while outperforming several benchmarks, though struggling with complex datasets.
-
Efficient Single-Pass Training for Multi-Turn Reasoning
本文提出了一种通过响应令牌复制和自定义注意力掩码来实现多轮推理对话单次前向传递训练的方法,显著提高了训练效率,同时维护了推理可见性和位置一致性。