Tag: Reasoning

All the articles with the tag "Reasoning".

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

Published: 20 May, 2025 at 11:11 AM

87.75 🤔

本文通过RL和SFT训练不同规模LLMs，发现RL在较大模型中促进显式ToM推理但在小模型中导致推理崩溃，而SFT意外取得高性能，揭示当前ToM基准测试可能无需显式人类式推理即可解决。
R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning

Published: 3 Jun, 2025 at 11:42 AM

87.73 🤔

本文提出 R1-Code-Interpreter 框架，通过监督微调和强化学习训练大型语言模型动态生成和执行代码，在 144 个推理和规划任务上显著提升准确率，R1-CI-14B 达到 64.1%，接近 GPT-4o+Code Interpreter 的性能。
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

Published: 21 May, 2025 at 11:29 AM

87.72 🤔

本文提出 Rodimus 和 Rodimus+ 模型，通过数据依赖温度选择（DDTS）和滑动窗口共享键注意力（SW-SKA）机制，在保持性能的同时显著降低大型语言模型的计算和内存复杂度，挑战了准确性与效率的权衡。
MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability

Published: 30 May, 2025 at 11:19 AM

87.72 🤔

本文提出 MASKSEARCH 框架，通过 Retrieval-Augmented Mask Prediction (RAMP) 预训练任务结合监督微调和强化学习，显著提升了大型语言模型在开放域多跳问答任务中的代理搜索能力。
Communicating Activations Between Language Model Agents

Published: 10 May, 2025 at 10:59 AM

87.71 🤔

This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.

Tag: Reasoning

Do Theory of Mind Benchmarks Need Explicit Human-like Reasoning in Language Models?

R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability

Communicating Activations Between Language Model Agents