Tag: Human-AI Interaction

All the articles with the tag "Human-AI Interaction".

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Published: 4 May, 2025 at 04:29 PM

50.69 🤔

本文提出PaperCoder框架，通过多代理LLM的多阶段管道自动从机器学习论文生成高质量代码仓库，提升了研究的可复现性，并在基准测试中显著优于现有方法。
Improving Reasoning Performance in Large Language Models via Representation Engineering

Published: 6 May, 2025 at 11:15 PM

88.60 👍

本文通过表示工程方法，利用控制向量干预大型语言模型的残差流，成功提升了Pythia和Mistral模型在归纳、演绎和数学推理任务上的表现，表明推理能力可通过调整内部表示进行调控。
Codenames as a Benchmark for Large Language Models

Published: 4 May, 2025 at 04:27 PM

77.18 👍

本论文提出使用Codenames游戏作为LLMs推理能力的基准，通过实验评估不同LLMs在语言理解、战略推理和合作方面的表现，展示了它们的独特行为和泛化潜力。
Humanity's Last Exam

Published: 4 May, 2025 at 04:28 PM

58.39 👍

本文引入HUMANITY'S LAST EXAM基准测试，通过专家创建的挑战性多模态问题，解决现有LLM基准饱和问题，评估模型在封闭式学术任务中的能力。
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Published: 8 May, 2025 at 10:22 AM

97.91 😐

Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.

Tag: Human-AI Interaction

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Improving Reasoning Performance in Large Language Models via Representation Engineering

Codenames as a Benchmark for Large Language Models

Humanity's Last Exam

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models