Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Efficient Single-Pass Training for Multi-Turn Reasoning
本文提出了一种通过响应令牌复制和自定义注意力掩码来实现多轮推理对话单次前向传递训练的方法,显著提高了训练效率,同时维护了推理可见性和位置一致性。
-
LZ Penalty: An information-theoretic repetition penalty for autoregressive language models
本文提出LZ惩罚方法,基于LZ77压缩算法的码长变化动态调整自回归语言模型的采样分布,在贪婪解码下有效消除退化重复,同时保持推理基准性能。
-
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models
本文提出了一种无线联邦LoRA微调框架,通过Sparsified Orthogonal Fine-Tuning (SOFT) 和Two Stage Federated Algorithm (TSFA) 优化参数稀疏化和动态资源分配,提高了通信效率和学习性能。
-
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation?
本文首次系统探索知识蒸馏技术在基于大语言模型的捆绑生成任务中的应用,通过提出一个全面的 KD 框架和实验验证,证明了在减少计算需求的同时能保持甚至提升性能。
-
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
This paper uses Sparse Autoencoders to identify and manipulate language-specific features in Large Language Models, introducing a monolinguality metric, demonstrating context dependency via code-switching, and enhancing steering vectors for better control over multilingual generation while revealing significant language-specific impacts through ablation studies.