Tag: Multimodality
All the articles with the tag "Multimodality".
-
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
This paper introduces VLM Q-Learning, an offline-to-online reinforcement learning method that fine-tunes Vision-Language Models for interactive decision-making by filtering suboptimal actions with a critic head, achieving significant performance improvements over supervised fine-tuning across multiple multimodal agent tasks.
-
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
本文提出UniME框架,通过文本判别知识蒸馏和硬负例增强指令微调,利用多模态大语言模型学习通用的多模态嵌入,提高了下游任务的判别性和组合能力。
-
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
RWKVQuant introduces a tailored Post Training Quantization framework for RWKV models, using a coarse-to-fine proxy to hybridize scalar and vector quantization and optimizing codebooks for element-wise operations, achieving ~3-bit quantization with minimal accuracy loss and significant memory and speed improvements.
-
Compact Recurrent Transformer with Persistent Memory
This paper introduces the Compact Recurrent Transformer (CRT), which combines shallow Transformers with RNNs to efficiently process long sequences using a single persistent memory vector, achieving superior or comparable performance to full-length Transformers and Transformer-XL on language and video tasks with significantly reduced computational cost.
-
Waking Up an AI: A Quantitative Framework for Prompt-Induced Phase Transition in Large Language Models
本文提出了一种双重提示框架(TIP和TQP)来量化大型语言模型(LLMs)的认知相变,发现LLMs对概念融合提示的情感反应与人类直觉差异显著,揭示了AI与人类认知在概念整合上的潜在鸿沟。