Posts
All the articles I've posted.
-
Agentic AI: The Era of Semantic Decoding
本文提出语义解码视角,将大型语言模型、人类和工具的协作框架化为语义空间中的优化过程,通过语义令牌的交换和语义解码算法的设计探索AI系统的新计算范式。
-
MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
MELON introduces a novel training-free defense against indirect prompt injection attacks on LLM agents by detecting independence of tool calls from user inputs through masked re-execution, achieving superior attack prevention (0.24% ASR on GPT-4o) and utility preservation (58.78% UA on GPT-4o) compared to existing methods.
-
MoM: Linear Sequence Modeling with Mixture-of-Memories
The Mixture-of-Memories (MoM) architecture introduces multiple independent memory states with a routing mechanism to enhance memory capacity and reduce interference in linear sequence modeling, achieving significant performance gains over other linear models on recall-intensive tasks and nearing Transformer performance at larger scales while maintaining efficiency.
-
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
The Video Prediction Policy (VPP) introduces a novel generalist robot policy that leverages predictive visual representations from fine-tuned video diffusion models to learn implicit inverse dynamics, achieving significant improvements of 41.5% on the Calvin ABC→D benchmark and 31.6% in real-world dexterous manipulation tasks over state-of-the-art baselines.
-
Always Skip Attention
This paper theoretically demonstrates the ill-conditioning of Self-Attention Blocks in Vision Transformers without skip connections, highlights their role as regularizers, and proposes Token Graying (SVD and DCT) to improve input token conditioning, achieving modest performance gains in supervised and self-supervised tasks.