Posts
All the articles I've posted.
-
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.
-
Detecting and Mitigating Hateful Content in Multimodal Memes with Vision-Language Models
本文提出了一种基于视觉-语言模型的定义引导提示技术和UnHateMeme框架,用于检测和缓解多模态模因中的仇恨内容,通过零样本和少样本提示实现高效检测,并生成非仇恨替代内容以保持图像-文本一致性,在实验中展现出显著效果。
-
Survey of Abstract Meaning Representation: Then, Now, Future
本文综述了抽象意义表示(AMR)作为一种图结构语义表示框架的发展、解析与生成方法、多语言扩展及下游应用,揭示其在提升机器语言理解中的潜力与局限。
-
Style Feature Extraction Using Contrastive Conditioned Variational Autoencoders with Mutual Information Constraints
This paper proposes a novel method combining contrastive learning with conditional variational autoencoders and mutual information constraints to extract style features from unlabeled data, demonstrating effectiveness on simple datasets like MNIST while facing challenges with natural image datasets due to augmentation limitations and qualitative evaluation.
-
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition
本文提出了 Trace-of-Thought Prompting,一种基于提示的知识蒸馏框架,通过将复杂问题分解为可管理的步骤,有效地将高资源模型的推理能力迁移到低资源模型,显著提升了低资源模型在算术推理任务上的表现,且无需大量微调。