Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
This paper demonstrates that a 1.5B parameter reasoning model (Distill-R1) outperforms larger non-reasoning LLMs as a discriminator in a text-to-SQL planning framework by leveraging a novel soft score extraction method from chain-of-thought outputs, though it struggles significantly as a generator.
-
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
本文提出StarPO框架和RAGEN系统,通过多轮轨迹级别强化学习训练LLM智能体,揭示了训练不稳定性(如Echo Trap)和推理能力不足的挑战,并通过StarPO-S改进稳定性和泛化性,但推理能力仍需细粒度奖励设计支持。
-
RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale
RADLADS introduces a cost-effective three-step distillation protocol to convert softmax attention transformers into linear attention models using only 350-700M tokens, achieving near-teacher performance on benchmarks and setting a new state-of-the-art for pure RNNs with models up to 72B parameters.
-
Communicating Activations Between Language Model Agents
This paper introduces Activation Communication (AC), a novel method for inter-LLM communication using intermediate activations instead of natural language, achieving up to 27% performance improvement over traditional methods with significantly reduced compute across coordination games and reasoning benchmarks.
-
CREAM: Consistency Regularized Self-Rewarding Language Models
本文提出了CREAM(Consistency Regularized Self-Rewarding Language Model)方法,通过衡量自奖励过程中不同迭代模型之间排序的一致性来正则化偏好训练,从而缓解奖励偏差问题,提高小型语言模型的对齐性能和训练稳定性。