Tag: Reasoning

All the articles with the tag "Reasoning".

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Published: 8 May, 2025 at 06:12 PM

91.54 🤔

This paper introduces Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long and short CoT datasets to fine-tune non-reasoning LLMs, achieving a 2.3% average accuracy improvement and 47.61% response length reduction on reasoning benchmarks.
Agentic AI: The Era of Semantic Decoding

Published: 8 May, 2025 at 12:27 AM

89.68 🤔

本文提出语义解码视角，将大型语言模型、人类和工具的协作框架化为语义空间中的优化过程，通过语义令牌的交换和语义解码算法的设计探索AI系统的新计算范式。
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator

Published: 8 May, 2025 at 06:13 PM

88.68 🤔

This paper demonstrates that a 1.5B parameter reasoning model (Distill-R1) outperforms larger non-reasoning LLMs as a discriminator in a text-to-SQL planning framework by leveraging a novel soft score extraction method from chain-of-thought outputs, though it struggles significantly as a generator.
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Published: 7 May, 2025 at 12:11 AM

88.02 🤔

本文提出StarPO框架和RAGEN系统，通过多轮轨迹级别强化学习训练LLM智能体，揭示了训练不稳定性（如Echo Trap）和推理能力不足的挑战，并通过StarPO-S改进稳定性和泛化性，但推理能力仍需细粒度奖励设计支持。
Communicating Activations Between Language Model Agents

Published: 10 May, 2025 at 10:59 AM

87.71 🤔

This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.

Tag: Reasoning

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Agentic AI: The Era of Semantic Decoding

When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator

RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

Communicating Activations Between Language Model Agents