Tag: Pre-training

All the articles with the tag "Pre-training".

ElChat: Adapting Chat Language Models Using Only Target Unlabeled Language Data

Published: 4 May, 2025 at 04:28 PM

75.31 🤔

本文提出ElChat方法，通过直接在目标无标签数据上适应聊天模型，并结合模型合并和权重复制技术，成功恢复聊天能力和指令遵循，同时在目标语言性能和安全方面表现出色。
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Published: 6 May, 2025 at 11:18 PM

73.79 🤔

R&B框架通过基于语义相似性的数据重新分组和梯度驱动的动态权重调整，以极低的计算开销（0.01%）在自然语言和多模态任务中匹配或超越现有数据混合策略，提升了基础模型训练效率。
Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models

Published: 4 May, 2025 at 04:33 PM

73.51 🤔

本文提出了一种无线联邦LoRA微调框架，通过Sparsified Orthogonal Fine-Tuning (SOFT) 和Two Stage Federated Algorithm (TSFA) 优化参数稀疏化和动态资源分配，提高了通信效率和学习性能。
Block Circulant Adapter for Large Language Models

Published: 4 May, 2025 at 04:34 PM

71.85 🤔

本文提出块循环适配器方法，通过利用块循环矩阵和FFT优化LLM的微调过程，显著降低存储和计算成本，同时通过学习率调整确保训练稳定。
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Published: 4 May, 2025 at 04:30 PM

70.65 🤔

本文提出MegaScale-Infer系统，通过分离注意力模块和FFN模块的并行策略以及高效M2N通信库，优化大规模MoE模型的推理效率，实现高达1.90倍的吞吐量提升。

Tag: Pre-training

ElChat: Adapting Chat Language Models Using Only Target Unlabeled Language Data

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Communication-Efficient Wireless Federated Fine-Tuning for Large-Scale AI Models

Block Circulant Adapter for Large Language Models

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism