Tag: Reasoning

All the articles with the tag "Reasoning".

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Published: 8 May, 2025 at 06:16 PM

87.52 🤔

SEAL, a training-free method, calibrates the reasoning process of Large Language Models by steering latent representations to reduce redundant thoughts, achieving up to 14.1% accuracy improvement and 50.4% token reduction across diverse benchmarks.
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Published: 6 May, 2025 at 11:18 PM

87.33 🤔

本文通过结合监督微调（SFT）、强化学习（RL）及细粒度奖励函数（如QATCH），显著提升了小型LLM在Text2SQL任务中的推理能力和性能，Think2SQL-7B模型在BIRD数据集上超越了400B+参数模型。
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Published: 6 May, 2025 at 11:15 PM

87.26 🤔

本文通过提出Gather-and-Aggregate (G&A)机制，揭示了Transformer和SSM模型在上下文检索能力上的性能差距主要源于少数关键头部的实现差异，并通过混合模型实验验证了注意力机制在改进SSM检索能力上的潜力。
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

Published: 8 May, 2025 at 06:13 PM

86.84 🤔

This paper proposes Recall with Reasoning (RwR), a method that enhances Mamba's long-context memory and extrapolation by distilling chain-of-thought summarization from a teacher model, achieving significant performance improvements on LONGMEMEVAL and HELMET benchmarks while preserving short-context capabilities.
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Published: 8 May, 2025 at 06:17 PM

86.55 🤔

This paper investigates zero RL training on diverse open base models, achieving significant accuracy and response length improvements while identifying key factors like reward design and data difficulty that influence the emergence of reasoning behaviors.

Tag: Reasoning

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL

Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism

Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild