Tag: Reasoning

All the articles with the tag "Reasoning".

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Published: 8 May, 2025 at 10:22 AM

97.91 😐

Insight-V introduces a scalable data generation pipeline and a multi-agent system with iterative DPO training to significantly enhance long-chain visual reasoning in MLLMs, achieving up to 7.0% performance gains on challenging benchmarks while maintaining perception capabilities.
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition

Published: 6 May, 2025 at 01:18 AM

93.53 😐

本文提出了 Trace-of-Thought Prompting，一种基于提示的知识蒸馏框架，通过将复杂问题分解为可管理的步骤，有效地将高资源模型的推理能力迁移到低资源模型，显著提升了低资源模型在算术推理任务上的表现，且无需大量微调。
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

Published: 6 May, 2025 at 11:19 PM

90.65 😐

本文通过提出一个四维度分类框架（什么扩展、如何扩展、哪里扩展、扩展效果如何），系统综述了测试时扩展（TTS）在大型语言模型中的研究现状，为理解和应用推理阶段计算扩展提供了结构化视角和实践指导。
Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Published: 6 May, 2025 at 01:18 AM

89.54 😐

本文通过实证研究发现，大型语言模型在推理任务中存在"过度思考"简单问题和"思考不足"困难问题的现象，其推理长度与正确性呈非单调关系，且简单偏好更短回答可在保持准确率的同时显著减少生成长度。
Weight Ensembling Improves Reasoning in Language Models

Published: 6 May, 2025 at 01:27 AM

88.15 😐

本文发现监督微调导致推理模型多样性坍塌损害 Pass@K，并提出通过插值早期与后期 SFT 检查点（WiSE-FT）的方法，有效提升模型多样性，同时提高 Pass@1 和 Pass@K，进而改善测试时缩放和强化学习效果。

Tag: Reasoning

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition

A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

Between Underthinking and Overthinking: An Empirical Study of Reasoning Length and correctness in LLMs

Weight Ensembling Improves Reasoning in Language Models