Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Efficient Single-Pass Training for Multi-Turn Reasoning
本文提出了一种通过响应令牌复制和自定义注意力掩码来实现多轮推理对话单次前向传递训练的方法,显著提高了训练效率,同时维护了推理可见性和位置一致性。
-
Hierarchical Attention Generates Better Proofs
本文提出层次注意力正则化方法,通过引导大型语言模型的注意力机制与数学推理的五级层次结构对齐,在 miniF2F 和 ProofNet 基准上分别提升证明成功率 2.05% 和 1.69%,并显著降低证明复杂度。
-
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models
本文提出了一种无线联邦LoRA微调框架,通过Sparsified Orthogonal Fine-Tuning (SOFT) 和Two Stage Federated Algorithm (TSFA) 优化参数稀疏化和动态资源分配,提高了通信效率和学习性能。
-
Toward Reasonable Parrots: Why Large Language Models Should Argue with Us by Design
This position paper advocates for redesigning Large Language Models as 'reasonable parrots' that integrate argumentation theory principles to foster critical thinking through multi-persona dialogues, challenging users with diverse perspectives rather than providing one-sided answers.
-
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?
本文首次系统探索知识蒸馏技术在基于大语言模型的捆绑生成任务中的应用,通过提出一个全面的 KD 框架和实验验证,证明了在减少计算需求的同时能保持甚至提升性能。