Tag: Transformer
All the articles with the tag "Transformer".
-
Purity Law for Generalizable Neural TSP Solvers
This paper introduces Purity Law (PuLa), a structural principle revealing sparsity bias in optimal TSP solutions, and proposes Purity Policy Optimization (PUPO), a training framework that significantly enhances the generalization of neural TSP solvers across diverse scales and distributions without inference overhead.
-
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
This paper proposes a three-dimensional taxonomy and develops TTP and HarmFormer tools to filter harmful content from web-scale LLM pretraining datasets, revealing significant toxicity prevalence and persistent safety gaps through benchmarks like HAVOC.
-
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
本文提出低秩克隆(LRC)方法,通过低秩投影矩阵和激活克隆实现从大型语言模型到小型语言模型的高效知识蒸馏,仅用10-20B tokens训练即可媲美或超越训练数据量达数万亿tokens的模型,显著提升训练效率。
-
Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer
本文通过冻结Transformer组件并提出MixiT模型,揭示了自注意力机制在检索和语言建模中的输入依赖性必要性,以及MLP层在记忆中的主导作用,强调了架构异质性对任务解决的重要性。
-
Small Models, Smarter Learning: The Power of Joint Task Training
本文通过ListOps数据集上的小型Transformer模型实验,揭示联合任务训练(如MAX+MED+SUM)显著降低学习难度、减少参数需求,并引导模型发现基于数字属性的高效算法,而非单纯记忆符号表。