Tag: Multimodal Data
All the articles with the tag "Multimodal Data".
-
Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections
This paper introduces a thermal detector-based traffic light system using YOLO-Thermal, a modified YOLOv8 framework, to dynamically adjust signal timings for individuals with mobility restrictions, achieving superior detection accuracy (89.1% APval) and enhancing intersection accessibility while addressing privacy and adverse condition challenges.
-
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
PICD introduces a versatile perceptual image compression codec using diffusion rendering with three-tiered conditioning to achieve high text accuracy and visual quality for both screen and natural images, outperforming existing methods in key metrics like FID and text accuracy.
-
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
This paper introduces VideoUFO, a million-scale dataset of 1.09 million video clips across 1,291 user-focused topics for text-to-video generation, curated from YouTube with minimal overlap with existing datasets, demonstrating improved performance on worst-performing topics when training a simple model like MVDiT.
-
Gameplay Highlights Generation
This paper presents a method to generate gameplay highlight reels by finetuning the X-CLIP multimodal model on an in-house FPS game dataset, achieving over 90% event detection accuracy and demonstrating transfer learning, while optimizing deployment through quantization.
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
本文提出ProRL方法,通过长时间强化学习结合KL散度惩罚和参考策略重置,在多样化任务上训练Nemotron-Research-Reasoning-Qwen-1.5B模型,显著扩展了大型语言模型的推理边界,尤其在基础模型表现较差的领域和分布外任务上表现出色。