Tag: Multimodal Systems

All the articles with the tag "Multimodal Systems".

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

Published: 28 May, 2025 at 11:22 AM

91.54 🤔

本文提出LORE-MERGING框架，通过低秩估计构建近似基础模型和任务向量，无需访问原始基础模型即可实现模型合并，并在多个基准数据集上展现出优于传统方法的性能。
GCN-Based Throughput-Oriented Handover Management in Dense 5G Vehicular Networks

Published: 14 May, 2025 at 11:06 AM

91.51 🤔

This paper introduces TH-GCN, a Graph Convolutional Network-based approach for handover management in dense 5G vehicular networks, which models dynamic network conditions to reduce handovers by up to 78% and improve signal quality and throughput through real-time, topology-aware decisions.
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

Published: 14 May, 2025 at 11:12 AM

90.54 🤔

LiteWebAgent is an open-source suite for VLM-based web agents that bridges the gap in production-ready solutions by offering an extensible framework with decoupled action generation and grounding, advanced planning, memory, tree search, and practical deployments via Vercel and Chrome extension.
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Published: 4 Jun, 2025 at 11:28 AM

89.30 🤔

本文提出逐层最优任务向量合并（LOT Merging）方法，通过最小化特征漂移优化模型合并过程，在视觉和视觉-语言任务上显著优于无训练基线方法，平均准确率提升高达4.4%。
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Published: 8 May, 2025 at 10:22 AM

89.20 🤔

The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.

Tag: Multimodal Systems

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

GCN-Based Throughput-Oriented Handover Management in Dense 5G Vehicular Networks

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations