Tag: Large Language Model

All the articles with the tag "Large Language Model".

Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation

Published: 21 May, 2025 at 11:09 AM

86.58 🤔

本文通过混合高斯模拟和大规模语言模型实验，揭示了知识蒸馏在生成模型中通过教师模型熵控制学生模型精度-召回权衡的机制，从而提升样本质量。
Behavior Injection: Preparing Language Models for Reinforcement Learning

Published: 1 Jun, 2025 at 11:45 AM

86.57 🤔

本文提出BRIDGE方法，通过在SFT阶段注入探索和利用行为增强大型语言模型的RL准备度，并在数学与逻辑推理任务上显著提升RFT性能。
RARE: Retrieval-Augmented Reasoning Modeling

Published: 21 May, 2025 at 11:28 AM

86.57 🤔

RARE提出了一种新范式，通过将领域知识存储外部化并优化推理能力，使轻量级模型在多领域基准测试中实现最先进的性能，超越检索增强的GPT-4和DeepSeek-R1。
ThinkSwitcher: When to Think Hard, When to Think Fast

Published: 24 May, 2025 at 11:12 AM

86.56 🤔

ThinkSwitcher通过一个轻量级自适应框架，使单一大型推理模型根据任务复杂性动态切换长短链式推理模式，在数学推理基准上减少20-30%计算成本，同时在复杂任务上保持较高准确率。
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Published: 8 May, 2025 at 06:17 PM

86.55 🤔

This paper investigates zero RL training on diverse open base models, achieving significant accuracy and response length improvements while identifying key factors like reward design and data difficulty that influence the emergence of reasoning behaviors.

Tag: Large Language Model

Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation

Behavior Injection: Preparing Language Models for Reinforcement Learning

RARE: Retrieval-Augmented Reasoning Modeling

ThinkSwitcher: When to Think Hard, When to Think Fast

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild