Tag: Mathematical Reasoning

All the articles with the tag "Mathematical Reasoning".

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Published: 28 May, 2025 at 11:24 AM

92.28 🤔

本文通过模块化方法，利用大型语言模型参数在数学推理和多语言能力上的分离性，提出Layer-Swapping等策略，在低资源语言跨语言迁移中显著优于非模块化基线，尤其在数据受限场景下表现最佳。
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Published: 22 May, 2025 at 11:12 AM

89.47 🤔

本文通过ZeroTIR框架利用强化学习训练基础大型语言模型自发执行Python代码解决数学问题，揭示了训练步数与代码使用频率、响应长度及任务准确率的正相关规律（Agent RL Scaling Law），并在数学基准上显著优于无工具基线。
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Published: 12 May, 2025 at 11:18 AM

80.55 🤔

This paper demonstrates through meta-analysis and experiments that Chain-of-Thought (CoT) prompting significantly enhances large language model performance on math and symbolic reasoning tasks, but offers limited benefits for non-symbolic tasks and underperforms compared to tool-augmented approaches.
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math

Published: 4 May, 2025 at 04:33 PM

83.56 👍

本文提出了一种多阶段训练方案，包括大规模蒸馏、滚动偏好优化和可验证奖励的强化学习，显著提升了小型语言模型在数学推理任务中的性能，使3.8B参数的Phi-4-Mini-Reasoning模型超过了近两倍参数的开源基线模型。

Tag: Mathematical Reasoning

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math