Tag: Large Language Model

All the articles with the tag "Large Language Model".

M+: Extending MemoryLLM with Scalable Long-Term Memory

Published: 3 Jun, 2025 at 11:27 AM

90.20 🤔

M+通过引入长期记忆机制和协同训练的检索器，显著扩展了MemoryLLM的知识保留能力至超过160k token，并在长上下文任务中优于基线，同时保持较低GPU内存消耗。
Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach

Published: 4 Jun, 2025 at 11:26 AM

87.86 🤔

本文提出GRADEX算法，通过一阶近似快速估计语言模型微调损失，实现子集选择的30倍以上加速，并在指令微调和思维链微调任务中比基线方法提升高达3.8%的性能。
Scalable Model Merging with Progressive Layer-wise Distillation

Published: 4 Jun, 2025 at 11:26 AM

87.67 🤔

本文提出ProDistill算法，通过逐层教师-学生蒸馏高效合并大型预训练模型，理论证明领域特定数据的必要性，并在视觉、语言任务上实现显著性能提升（6.14%-6.61%），展现出优越的内存和计算效率。
Large Vocabulary Size Improves Large Language Models

Published: 5 Jun, 2025 at 11:24 AM

85.40 🤔

本文通过实验证明较大词汇量能显著提升单语大型语言模型在英语和日语任务中的性能，并提出了一种在持续训练中更换词汇表的简单方法以适配目标语言，进一步提升模型表现。
Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs

Published: 3 Jun, 2025 at 11:27 AM

89.76 🤔

本文提出了一种通过中间层表示对齐增强大型语言模型跨语言迁移能力的方法，在微调过程中交替优化任务和对齐目标，并在槽填充、机器翻译等任务中取得了改进，尤其对低资源语言有益。

Tag: Large Language Model

M+: Extending MemoryLLM with Scalable Long-Term Memory

Scalable Fine-tuning from Multiple Data Sources: A First-Order Approximation Approach

Scalable Model Merging with Progressive Layer-wise Distillation

Large Vocabulary Size Improves Large Language Models

Middle-Layer Representation Alignment for Cross-Lingual Transfer in Fine-Tuned LLMs