Tag: Representation Learning
All the articles with the tag "Representation Learning".
-
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
本文提出了一种基于对比学习的SpeechLLM任务无关预训练方法,通过对齐语音和文本表示,在低资源场景下显著提升了ASR、语音翻译和语音问答任务的性能,并超越了多个专门模型。
-
Hide & Seek: Transformer Symmetries Obscure Sharpness & Riemannian Geometry Finds It
This paper introduces geodesic sharpness, a novel measure using Riemannian geometry to account for transformer symmetries on a quotient manifold, demonstrating stronger correlations with generalization across diagonal networks, vision transformers, and language models compared to traditional adaptive sharpness.
-
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging
本文提出一个多模态大语言模型(MLLM)融合基准和改进的任务向量优化方法(WUDI v2),通过低秩近似去除噪声并优化合并向量,在多任务和跨模态融合实验中取得平均2.48%的性能提升,展现了无需数据训练即可构建高性能MLLMs的潜力。
-
You Do Not Fully Utilize Transformer's Representation Capacity
本文提出Layer-Integrated Memory (LIMe),通过学习跨层路由机制整合之前所有层的Key-Value表示,显著缓解Transformer的表示崩塌问题,并在语言建模、推理任务和深层网络中实现更快收敛和更高准确率。
-
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners
本文提出了一种无训练干预方法,通过在推理时移除大型语言模型中的语言特异性表示以解耦语言和推理,显著提升了多语言推理性能,尤其是在中低资源语言上,同时揭示了语言信号与推理准确性的负相关性。