Tag: Reasoning

All the articles with the tag "Reasoning".

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

Published: 3 Jun, 2025 at 11:29 AM

86.23 🤔

本文通过XReasoning基准揭示了大型推理模型在多语言推理中语言匹配与答案准确性之间的权衡，并通过提示破解和少样本后训练方法提高语言匹配率，但以牺牲准确性为代价，凸显了当前模型的局限性。
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Published: 28 May, 2025 at 11:22 AM

86.17 🤔

本文通过提出基于强化学习的LASER系列方法（LASER, LASER-D, LASER-DE），利用动态和难度感知的长度奖励塑造，在保持大型推理模型性能的同时显著提高token效率，在多个数学推理基准上实现了Pareto最优的准确率和效率权衡。
Context-Free Synthetic Data Mitigates Forgetting

Published: 23 May, 2025 at 11:15 AM

86.17 🤔

本文提出了一种上下文无关合成数据（CFS）方法，通过生成无条件样本并结合微调和预训练损失，缓解大型语言模型在数据不可知场景下的灾难性遗忘，实验在Olmo-1B和R1-Distill-Llama-8B模型上验证了其有效性。
Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

Published: 28 May, 2025 at 11:25 AM

86.15 🤔

本文提出自适应直接长度惩罚（A-DLP）方法，通过动态调整强化学习中的长度惩罚系数，在减少大型语言模型推理长度超过 50% 的同时保持准确性，为构建高效推理模型提供了新方向。
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?

Published: 2 Jun, 2025 at 11:31 AM

86.10 🤔

本文首次系统比较了推理型与非推理型大语言模型在自然语言生成评估中的表现，发现推理能力的效果高度依赖模型架构，OpenAI o3-mini 在机器翻译评估中显著优于非推理型模型，而 DeepSeek-R1 仅在文本摘要一致性评估中表现突出，蒸馏模型在 32B 参数规模时仍有效。

Tag: Reasoning

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

Learn to Reason Efficiently with Adaptive Length-based Reward Shaping

Context-Free Synthetic Data Mitigates Forgetting

Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?