Tag: Pre-training
All the articles with the tag "Pre-training".
-
Unveiling Language-Specific Features in Large Language Models via Sparse Autoencoders
This paper uses Sparse Autoencoders to identify and manipulate language-specific features in Large Language Models, introducing a monolinguality metric, demonstrating context dependency via code-switching, and enhancing steering vectors for better control over multilingual generation while revealing significant language-specific impacts through ablation studies.
-
Splitwiser: Efficient LM inference with constrained resources
Splitwiser introduces a method to split LLM inference phases on a single GPU using multiprocessing and NVIDIA MPS, achieving modest latency reductions (up to 18.2%) and throughput improvements (up to 1.42x) on Huggingface and vLLM pipelines, though constrained by overheads and scalability issues.
-
Kimi-Audio Technical Report
本文提出Kimi-Audio,一个开源的音频基础模型,通过结合音频分词、LLM处理和逆分词的统一架构,以及大规模多模态训练,实现了音频理解、生成和对话的多任务SOTA性能。
-
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
本文提出基于超网络的HYPEROFA方法,用于初始化新语言令牌嵌入,提高PLM对低资源语言的适应性,性能优于随机初始化并与OFA方法持平或更好。
-
Adaptive Layer-skipping in Pre-trained LLMs
本文提出FlexiDepth方法,通过插件式路由器和适配器实现预训练LLM的自适应层跳过,提高计算效率同时保持生成性能,并通过实验揭示了token类型对计算需求的影响。