Tag: Pre-training
All the articles with the tag "Pre-training".
-
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging
本文通过模型融合方法整合快速思维和慢速推理能力,实现长到短推理,在7B模型上将响应长度压缩高达55%且保持性能,提出了一种高效解决大语言模型过度思考问题的方案。
-
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
This paper proposes a three-dimensional taxonomy and develops TTP and HarmFormer tools to filter harmful content from web-scale LLM pretraining datasets, revealing significant toxicity prevalence and persistent safety gaps through benchmarks like HAVOC.
-
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
本文提出低秩克隆(LRC)方法,通过低秩投影矩阵和激活克隆实现从大型语言模型到小型语言模型的高效知识蒸馏,仅用10-20B tokens训练即可媲美或超越训练数据量达数万亿tokens的模型,显著提升训练效率。
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
本文提出ProRL方法,通过长时间强化学习结合KL散度惩罚和参考策略重置,在多样化任务上训练Nemotron-Research-Reasoning-Qwen-1.5B模型,显著扩展了大型语言模型的推理边界,尤其在基础模型表现较差的领域和分布外任务上表现出色。
-
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection
本文提出AttentionInfluence方法,通过无监督地利用预训练模型注意力头机制选择推理密集型数据,显著提升了7B参数模型在知识和推理任务上的性能,展现了弱到强的扩展潜力。