Tag: Large Language Model

All the articles with the tag "Large Language Model".

MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities

Published: 23 May, 2025 at 11:10 AM

87.03 🤔

本文提出MoL框架，通过对领域语料使用CE损失和对通用语料使用KL散度损失的双重优化策略，显著提升大型语言模型的领域专长，同时有效保留通用能力，并在医学领域任务中取得优异表现。
Not All Correct Answers Are Equal: Why Your Distillation Source Matters

Published: 24 May, 2025 at 11:11 AM

86.97 🤔

本文通过从三个顶尖大语言模型中提炼189万推理数据，系统研究了提炼源对学生模型性能的影响，发现AM-Thinking-v1提炼数据在多个推理基准上显著提升学生模型表现，并展现出适应性生成长度特性。
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

Published: 17 May, 2025 at 11:08 AM

86.94 🤔

This paper introduces a fine-tuning strategy for LLMs that leverages the unequal importance of attention matrices and customized learning rates to enhance efficiency, demonstrating through theoretical analysis and experiments on GLUE benchmarks that fine-tuning only Wq and Wv with higher learning rates for Wv can match or exceed full fine-tuning performance with fewer parameters.
R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning

Published: 5 Jun, 2025 at 11:25 AM

86.91 🤔

R-LoRA通过多头随机化（包括多头Dropout和随机初始化）增强了LoRA在多任务学习中的性能，有效提升了任务特定知识的捕获能力，同时降低了GPU内存使用和训练时间。
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Published: 28 May, 2025 at 11:21 AM

86.90 🤔

本文提出 ALPS 算法，通过基于权重分布的参数对齐分布分数（sPAD）定位任务敏感注意力头并剪枝，仅更新 10% 的注意力参数即在通用、数学和代码任务上实现性能提升，同时展现头部可转移性和知识遗忘缓解效果。

Tag: Large Language Model

MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities

Not All Correct Answers Are Equal: Why Your Distillation Source Matters

Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

R-LoRA: Randomized Multi-Head LoRA for Efficient Multi-Task Learning

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models