Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Small Models, Smarter Learning: The Power of Joint Task Training
本文通过ListOps数据集上的小型Transformer模型实验,揭示联合任务训练(如MAX+MED+SUM)显著降低学习难度、减少参数需求,并引导模型发现基于数字属性的高效算法,而非单纯记忆符号表。
-
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
本文提出SELF-TUNING框架,通过自教策略(SELF-TEACHING)显著提升大型语言模型从新文档中获取知识的能力,并在记忆、提取和推理任务上取得优异表现,同时保持较好的知识保留能力。
-
Why do LLMs attend to the first token?
This paper argues that attention sinks in LLMs, particularly at the first token, are a useful mechanism to prevent over-mixing of information in deep Transformers, supported by theoretical insights and empirical evidence from Gemma 7B, LLaMa 3.1 models, and pre-training experiments showing stronger sinks with larger models and longer contexts.
-
M+: Extending MemoryLLM with Scalable Long-Term Memory
M+通过引入长期记忆机制和协同训练的检索器,显著扩展了MemoryLLM的知识保留能力至超过160k token,并在长上下文任务中优于基线,同时保持较低GPU内存消耗。
-
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
This paper investigates inter-layer communication in Transformer LMs by identifying low-rank communication channels via SVD, demonstrating their causal role in prompt sensitivity through interventions that significantly improve performance on context retrieval tasks like the Laundry List task.