Tag: Attention Distribution
All the articles with the tag "Attention Distribution".
-
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training
ZeroTuning提出了一种无需训练的方法,通过调整大型语言模型初始token的注意力分布,在文本分类、问答和多轮对话任务中显著提升性能,同时展现出对资源限制和长上下文的鲁棒性。