Tag: Large Language Model

All the articles with the tag "Large Language Model".

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Published: 22 May, 2025 at 11:22 AM

87.53 🤔

本文揭示了强化学习中低概率token过度主导模型更新的问题，并提出Advantage Reweighting和Lopti两种方法，通过平衡token更新权重显著提升GRPO训练的大语言模型性能，最高在K&K Logic Puzzle任务上提升46.2%。
Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning

Published: 26 May, 2025 at 11:25 AM

87.52 🤔

本文提出RLKD，一个基于强化学习的知识蒸馏框架，通过生成结构奖励模型（GSRM）将教师模型推理中的隐式多分支结构传递给学生模型，实验表明其在数学和问答任务上显著优于SFT和传统RL方法。
SEAL: Steerable Reasoning Calibration of Large Language Models for Free

Published: 8 May, 2025 at 06:16 PM

87.52 🤔

SEAL, a training-free method, calibrates the reasoning process of Large Language Models by steering latent representations to reduce redundant thoughts, achieving up to 14.1% accuracy improvement and 50.4% token reduction across diverse benchmarks.
More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Published: 2 Jun, 2025 at 01:15 PM

87.51 🤔

本文提出DrICL方法，通过差异化学习和基于优势的重新加权优化大型语言模型在many-shot上下文学习中的性能，并在自建的ICL-50数据集上验证了其在多种任务中的稳定性和有效性。
Zero-Shot Vision Encoder Grafting via LLM Surrogates

Published: 2 Jun, 2025 at 11:23 AM

87.49 🤔

本文提出通过构建小型代理模型训练视觉编码器并零样本嫁接至大型LLM（如Llama-70B），在保持视觉理解能力的同时将VLM训练成本降低约45%。

Tag: Large Language Model

Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

Distilling the Implicit Multi-Branch Structure in LLMs' Reasoning via Reinforcement Learning

SEAL: Steerable Reasoning Calibration of Large Language Models for Free

More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

Zero-Shot Vision Encoder Grafting via LLM Surrogates