Tag: Large Language Model

All the articles with the tag "Large Language Model".

Hybrid Latent Reasoning via Reinforcement Learning

Published: 3 Jun, 2025 at 11:43 AM

86.71 🤔

本文提出HRPO，一种基于强化学习的混合潜在推理框架，通过门控机制结合离散token和连续隐状态，显著提升了大型语言模型在知识和推理任务上的性能，同时减少了对链式思维数据的依赖。
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

Published: 30 May, 2025 at 11:13 AM

86.69 🤔

本文提出 ConciseR，一种两阶段强化学习框架，通过 GRPO++ 提升推理能力并通过 L-GRPO 优化响应长度，在保持准确性的同时显著减少 CoT 响应长度，优于多个基准数据集上的现有方法。
On the Generalization vs Fidelity Paradox in Knowledge Distillation

Published: 26 May, 2025 at 11:23 AM

86.63 🤔

本文通过大规模实证分析揭示知识蒸馏（KD）显著提升小型语言模型的零样本推理性能（高达10%），但对大型模型收益有限，且性能提升与推理保真度存在脱节，强调任务专长和适度参数调整的重要性。
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Published: 17 May, 2025 at 11:04 AM

86.60 🤔

This paper introduces MiMo-7B, a 7B-parameter LLM optimized for reasoning through innovative pre-training with reasoning-dense data and multi-token prediction, and post-training with RL using test-difficulty-driven rewards, achieving superior performance over larger models and OpenAI o1-mini on mathematics and coding benchmarks.
Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models

Published: 17 May, 2025 at 11:20 PM

86.60 🤔

本文提出SAFE方法，通过选择性冻结对任务贡献较小的适配器，实现资源高效的语言模型微调，在显著降低内存使用和计算成本的同时，保持甚至提升模型性能。

Tag: Large Language Model

Hybrid Latent Reasoning via Reinforcement Learning

Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

On the Generalization vs Fidelity Paradox in Knowledge Distillation

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models