Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
本文通过提出Gather-and-Aggregate (G&A)机制,揭示了Transformer和SSM模型在上下文检索能力上的性能差距主要源于少数关键头部的实现差异,并通过混合模型实验验证了注意力机制在改进SSM检索能力上的潜力。
-
Born a Transformer -- Always a Transformer?
本文通过检索和复制任务研究Transformer的长度泛化限制,发现预训练选择性增强了归纳能力(向右/向前任务),但无法克服架构固有局限,微调可平衡不对称性但仍受理论约束。
-
No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces
本文提出了一种等向性模型合并框架,通过展平任务矩阵奇异值谱并结合公共与任务特定子空间,显著提升了多任务模型的性能,在视觉和语言任务上达到了最先进的合并效果。
-
Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning
本文提出Prune-on-Logic框架,通过将长链思维(Long-CoT)转化为逻辑图并选择性剪枝低效验证步骤,在提升小型语言模型(SLMs)推理准确率的同时降低推理成本,揭示了剪枝作为能力对齐策略的潜力。
-
Deformable Beta Splatting
Deformable Beta Splatting (DBS) enhances real-time radiance field rendering by introducing deformable Beta Kernels for superior geometric fidelity, Spherical Beta for efficient color encoding, and kernel-agnostic MCMC optimization, achieving state-of-the-art visual quality with 45% fewer parameters and 1.5x faster rendering than 3DGS-MCMC.