Tag: Image Generation
All the articles with the tag "Image Generation".
-
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
PICD introduces a versatile perceptual image compression codec using diffusion rendering with three-tiered conditioning to achieve high text accuracy and visual quality for both screen and natural images, outperforming existing methods in key metrics like FID and text accuracy.
-
Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Selftok introduces a non-spatial autoregressive visual tokenizer using diffusion timesteps, unifying vision-language models and enabling effective reinforcement learning for superior text-to-image generation, as demonstrated on GenEval and DPG-Bench benchmarks.
-
X-Fusion: Introducing New Modality to Frozen Large Language Models
本文提出X-Fusion框架,通過凍結LLM參數並添加雙塔結構,高效實現多模態理解和生成,同時保留原始語言能力。
-
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models
本文提出Token-Shuffle方法,通过利用视觉词汇维度冗余动态合并和恢复图像令牌,实现高效的高分辨率文本到图像生成,同时在统一自回归框架下保持出色性能。