Tag: Representation Learning

All the articles with the tag "Representation Learning".

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Published: 22 May, 2025 at 11:16 AM

92.95 🤔

本文提出 LATENTSEEK 框架，通过在潜在空间中基于策略梯度的测试时实例级适应（TTIA），显著提升大型语言模型的推理能力，同时探索测试时扩展的新方向。
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Published: 23 May, 2025 at 11:15 AM

92.79 🤔

本文提出测试时模型合并（TTMM）方法，通过在训练时预训练大量专家模型并在测试时动态合并参数，以几乎无测试时开销的方式逼近测试时训练（TTT）的语言建模性能。
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Published: 2 Jun, 2025 at 11:24 AM

91.96 🤔

Sentinel提出了一种轻量化的句子级别上下文压缩框架，通过探测0.5B代理模型的注意力信号实现高达5倍压缩率，并在LongBench基准上匹配7B规模系统的QA性能。
From Words to Worlds: Compositionality for Cognitive Architectures

Published: 25 May, 2025 at 11:24 AM

91.89 🤔

本文通过设计三种任务评估大型语言模型（LLMs）的组合性能力，发现模型规模扩大通常提升组合性表现，而指令微调效果不一致，提示组合性对性能提升的解释力有限。
One Task Vector is not Enough: A Large-Scale Study for In-Context Learning

Published: 4 Jun, 2025 at 11:28 AM

91.21 🤔

本文通过大规模数据集 QUITEAFEW 研究上下文学习中任务向量的作用，发现其在中间层表现最佳但对复杂任务支持不足，提出复杂任务依赖多个子任务向量的分布式表示假设。

Tag: Representation Learning

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging

Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

From Words to Worlds: Compositionality for Cognitive Architectures

One Task Vector is not Enough: A Large-Scale Study for In-Context Learning