Tag: Multimodal Systems
All the articles with the tag "Multimodal Systems".
-
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
This survey paper provides a comprehensive overview of adversarial attacks on multimodal AI systems across text, image, video, and audio modalities, categorizing threats by attacker knowledge, intention, and execution to equip practitioners with knowledge of vulnerabilities and cross-modal risks.
-
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
本文提出了一种基于视觉-语言模型的定义引导提示技术和UnHateMeme框架,用于检测和缓解多模态模因中的仇恨内容,通过零样本和少样本提示实现高效检测,并生成非仇恨替代内容以保持图像-文本一致性,在实验中展现出显著效果。
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
-
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
ASTRA introduces an efficient defense for Vision Language Models by adaptively steering activations away from adversarial directions using image attribution, achieving state-of-the-art performance in mitigating jailbreak attacks with minimal impact on benign utility and high inference efficiency.
-
Activated LoRA: Fine-tuned LLMs for Intrinsics
本文提出 Activated LoRA (aLoRA),一种改进的 LoRA 框架,通过仅对激活后 token 适配权重,复用基础模型 KV 缓存,实现高效动态适配,并在多个任务上保持与标准 LoRA 相当的性能,同时显著降低推理成本。