Tag: Reasoning

All the articles with the tag "Reasoning".

Not All Correct Answers Are Equal: Why Your Distillation Source Matters

Published: 24 May, 2025 at 11:11 AM

86.97 🤔

本文通过从三个顶尖大语言模型中提炼189万推理数据，系统研究了提炼源对学生模型性能的影响，发现AM-Thinking-v1提炼数据在多个推理基准上显著提升学生模型表现，并展现出适应性生成长度特性。
ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

Published: 28 May, 2025 at 11:21 AM

86.90 🤔

本文提出 ALPS 算法，通过基于权重分布的参数对齐分布分数（sPAD）定位任务敏感注意力头并剪枝，仅更新 10% 的注意力参数即在通用、数学和代码任务上实现性能提升，同时展现头部可转移性和知识遗忘缓解效果。
R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

Published: 30 May, 2025 at 11:22 AM

86.90 🤔

R1-Compress通过块级压缩和块间搜索机制有效压缩长链式推理（Long-CoT），在减少约20% token使用量的同时保持了与基线接近的推理准确率（92.4% vs 93.0%）。
Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Published: 18 May, 2025 at 11:22 AM

86.89 🤔

本文通过将GRPO算法应用于Qwen2-Audio-7B-Instruct模型，在音频问答任务中取得了64.5%的最佳准确率，证明强化学习在小规模数据集上优于监督微调，但显式推理过程未显著提升性能，且与人类水平仍有差距。
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Published: 17 May, 2025 at 11:02 AM

86.87 🤔

This paper introduces a systematic approach to enhance large reasoning models by aligning them with deduction, induction, and abduction meta-abilities through a three-stage pipeline of individual training, parameter merging, and domain-specific RL, achieving up to 4% performance gains over instruction-tuned baselines across math, coding, and science benchmarks.

Tag: Reasoning

Not All Correct Answers Are Equal: Why Your Distillation Source Matters

ALPS: Attention Localization and Pruning Strategy for Efficient Alignment of Large Language Models

R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models