Tag: Representation Learning

All the articles with the tag "Representation Learning".

Small Models, Smarter Learning: The Power of Joint Task Training

Published: 28 May, 2025 at 11:21 AM

90.76 🤔

本文通过ListOps数据集上的小型Transformer模型实验，揭示联合任务训练（如MAX+MED+SUM）显著降低学习难度、减少参数需求，并引导模型发现基于数字属性的高效算法，而非单纯记忆符号表。
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Published: 20 May, 2025 at 11:12 AM

90.50 🤔

本文提出SELF-TUNING框架，通过自教策略（SELF-TEACHING）显著提升大型语言模型从新文档中获取知识的能力，并在记忆、提取和推理任务上取得优异表现，同时保持较好的知识保留能力。
Why do LLMs attend to the first token?

Published: 17 May, 2025 at 11:04 AM

90.22 🤔

This paper argues that attention sinks in LLMs, particularly at the first token, are a useful mechanism to prevent over-mixing of information in deep Transformers, supported by theoretical insights and empirical evidence from Gemma 7B, LLaMa 3.1 models, and pre-training experiments showing stronger sinks with larger models and longer contexts.
M+: Extending MemoryLLM with Scalable Long-Term Memory

Published: 3 Jun, 2025 at 11:27 AM

90.20 🤔

M+通过引入长期记忆机制和协同训练的检索器，显著扩展了MemoryLLM的知识保留能力至超过160k token，并在长上下文任务中优于基线，同时保持较低GPU内存消耗。
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Published: 13 May, 2025 at 11:21 AM

90.20 🤔

This paper investigates inter-layer communication in Transformer LMs by identifying low-rank communication channels via SVD, demonstrating their causal role in prompt sensitivity through interventions that significantly improve performance on context retrieval tasks like the Laundry List task.

Tag: Representation Learning

Small Models, Smarter Learning: The Power of Joint Task Training

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

Why do LLMs attend to the first token?

M+: Extending MemoryLLM with Scalable Long-Term Memory

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models