Tag: Multimodal Systems
All the articles with the tag "Multimodal Systems".
-
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
本文提出LORE-MERGING框架,通过低秩估计构建近似基础模型和任务向量,无需访问原始基础模型即可实现模型合并,并在多个基准数据集上展现出优于传统方法的性能。
-
GCN-Based Throughput-Oriented Handover Management in Dense 5G Vehicular Networks
This paper introduces TH-GCN, a Graph Convolutional Network-based approach for handover management in dense 5G vehicular networks, which models dynamic network conditions to reduce handovers by up to 78% and improve signal quality and throughput through real-time, topology-aware decisions.
-
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
LiteWebAgent is an open-source suite for VLM-based web agents that bridges the gap in production-ready solutions by offering an extensible framework with decoupled action generation and grounding, advanced planning, memory, tree search, and practical deployments via Vercel and Chrome extension.
-
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
本文提出逐层最优任务向量合并(LOT Merging)方法,通过最小化特征漂移优化模型合并过程,在视觉和视觉-语言任务上显著优于无训练基线方法,平均准确率提升高达4.4%。
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.