Tag: Transformer

All the articles with the tag "Transformer".

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

Published: 4 May, 2025 at 04:28 PM

53.32 🤔

本文提出PointLoRA方法，通过低秩适配和多尺度令牌选择，实现点云模型的参数高效微调，显著减少可训练参数同时在多个数据集上达到竞争性性能。
State Space Models are Strong Text Rerankers

Published: 4 May, 2025 at 04:26 PM

50.53 🤔

本文通过全面benchmark比较状态空间模型如Mamba与Transformer在文本重排序任务中的性能和效率，发现Mamba模型可实现类似性能但效率较低，并强调了未来优化方向。
Which Attention Heads Matter for In-Context Learning?

Published: 5 May, 2025 at 11:15 PM

90.67 👍

本文通过对12个大型语言模型进行消融研究和训练动态分析，发现函数向量头是驱动少样本上下文学习的主要机制，尤其在大型模型中，并且许多函数向量头在训练过程中从归纳头演变而来，纠正了先前认为归纳头是主要驱动力的观点。
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Published: 4 May, 2025 at 04:29 PM

85.10 👍

论文通过大规模实验分析了Transformer LLMs中稀疏注意力的效率-准确性权衡，揭示了长序列下更大稀疏模型的优势，并建立了可推广的缩放定律。
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding

Published: 4 May, 2025 at 04:27 PM

83.39 👍

本文系统揭示了自注意力模块中大规模值在LLM上下文知识理解中的关键作用，并通过实验证明其源于旋转位置编码（RoPE），为模型优化和量化策略提供新洞见。

Tag: Transformer

PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning

State Space Models are Strong Text Rerankers

Which Attention Heads Matter for In-Context Learning?

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding