Posts

All the articles I've posted.

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Published: 20 May, 2025 at 11:10 AM

85.99 🤔

本文提出 S-GRPO 方法，通过串行组生成和递减奖励策略调控大型语言模型中间推理过程，在多个基准数据集上实现推理长度减少 35.4%~61.1% 和准确率提升 0.72%~6.08%，显著提升推理效率。
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Published: 31 May, 2025 at 11:34 AM

85.98 🤔

本文提出SORSA，一种基于奇异值分解和正交正则化的参数高效微调方法，通过优化权重矩阵条件数提升大型语言模型在下游任务上的性能，并在GSM-8K等基准测试中显著优于LoRA和PiSSA等方法。
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

Published: 2 Jun, 2025 at 11:33 AM

85.98 🤔

本文作为立场论文，主张强化微调（RFT）通过强化学习算法显著提升多模态大语言模型（MLLMs）的推理能力，总结了社区在多模态、任务和领域上的进展，并提出了五个未来研究方向，但缺乏具体方法创新和实验验证。
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

Published: 4 Jun, 2025 at 11:26 AM

85.95 🤔

本文提出Satori模型，通过Chain-of-Action-Thought (COAT) 推理框架和两阶段训练（小规模格式调整与大规模强化学习），显著提升了单一7B大型语言模型在数学推理及非领域任务中的自回归搜索和推理能力。
It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

Published: 4 Jun, 2025 at 11:59 AM

85.94 🤔

本文提出基于广义高斯分布（GGD）的LLM优化框架，通过GG初始化、DeepShape后处理和RF8浮点格式，从初始化到部署全流程提升模型压缩率、精度和硬件效率，实验显示显著的压缩率提升和可控的精度损失。

Posts

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models

Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models

Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search

It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs