Tag: Agent
All the articles with the tag "Agent".
-
LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications
LiteWebAgent is an open-source suite for VLM-based web agents that bridges the gap in production-ready solutions by offering an extensible framework with decoupled action generation and grounding, advanced planning, memory, tree search, and practical deployments via Vercel and Chrome extension.
-
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving
本文通过ZeroTIR框架利用强化学习训练基础大型语言模型自发执行Python代码解决数学问题,揭示了训练步数与代码使用频率、响应长度及任务准确率的正相关规律(Agent RL Scaling Law),并在数学基准上显著优于无工具基线。
-
MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison
MELON introduces a novel training-free defense against indirect prompt injection attacks on LLM agents by detecting independence of tool calls from user inputs through masked re-execution, achieving superior attack prevention (0.24% ASR on GPT-4o) and utility preservation (58.78% UA on GPT-4o) compared to existing methods.
-
Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
本文提出Pre-Act方法,通过多步骤规划和详细推理提升LLM代理性能,并通过微调小型模型(如Llama 3.1 70B)在Almita数据集上实现比GPT-4高69.5%的行动准确率和28%的目标完成率。
-
Putting It All into Context: Simplifying Agents with LCLMs
本文提出基于长上下文语言模型(LCLM)的‘state-in-context’代理设计,通过将整个环境状态纳入上下文简化软件工程任务的代理架构,在SWE-bench Verified上实现与复杂脚手架方法相当的性能(Gemini-2.5-Pro达到50.8% pass@1)。