Tag: Human-AI Interaction

All the articles with the tag "Human-AI Interaction".

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

Published: 14 May, 2025 at 11:12 AM

90.54 🤔

LiteWebAgent is an open-source suite for VLM-based web agents that bridges the gap in production-ready solutions by offering an extensible framework with decoupled action generation and grounding, advanced planning, memory, tree search, and practical deployments via Vercel and Chrome extension.
Thinking Out Loud: Do Reasoning Models Know When They're Right?

Published: 25 May, 2025 at 11:51 AM

90.51 🤔

本文通过对比指令微调、监督微调和强化学习训练的大型推理模型，发现推理导向训练显著提升了推理任务中的准确性和校准能力，但在事实性任务中可能削弱小规模模型对知识边界的感知。
Agentic AI: The Era of Semantic Decoding

Published: 8 May, 2025 at 12:27 AM

89.68 🤔

本文提出语义解码视角，将大型语言模型、人类和工具的协作框架化为语义空间中的优化过程，通过语义令牌的交换和语义解码算法的设计探索AI系统的新计算范式。
Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization

Published: 5 Jun, 2025 at 11:25 AM

89.45 🤔

本文提出了一种通过迭代训练和人类反馈将提示内部化到模型权重中的方法，使基于Llama-3.1-70B的AI代理在多任务基准测试ToolQA和OfficeBench上分别达到97.9%和90.3%的成功率，超越GPT-4o和DeepSeek-V3，同时显著提升推理效率。
MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison

Published: 8 May, 2025 at 10:22 AM

89.40 🤔

MELON introduces a novel training-free defense against indirect prompt injection attacks on LLM agents by detecting independence of tool calls from user inputs through masked re-execution, achieving superior attack prevention (0.24% ASR on GPT-4o) and utility preservation (58.78% UA on GPT-4o) compared to existing methods.

Tag: Human-AI Interaction

LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications

Thinking Out Loud: Do Reasoning Models Know When They're Right?

Agentic AI: The Era of Semantic Decoding

Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization

MELON: Provable Indirect Prompt Injection Defense via Masked Re-execution and Tool Comparison