Posts

All the articles I've posted.

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Published: 30 May, 2025 at 11:15 AM

88.65 🤔

R1-Searcher++ 通过两阶段训练策略（SFT 和 RL），结合奖励机制和记忆模块，使大型语言模型自适应地平衡内部知识与外部检索，在多跳问答任务中显著提升准确性和检索效率。
LIFEBench: Evaluating Length Instruction Following in Large Language Models

Published: 25 May, 2025 at 11:47 AM

88.64 🤔

本文通过引入LIFEBENCH基准，系统评估了26个大型语言模型在长度指令遵循上的能力，发现其在长长度约束下普遍表现不佳，且远未达到厂商宣称的最大输出长度，揭示了模型在长度感知和长文本生成上的根本局限性。
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Published: 17 May, 2025 at 11:04 AM

88.64 🤔

This paper introduces Temperature Scaling (TS) and Trace Length Control for Dynamic Reasoning (TLDR) to enhance token efficiency in small language models, achieving up to 50% reduction in response length with minimal accuracy loss across multiple reasoning benchmarks.
Skywork Open Reasoner 1 Technical Report

Published: 3 Jun, 2025 at 11:44 AM

88.60 🤔

Skywork-OR1通过提出MAGIC框架，利用多阶段训练和自适应熵控制的强化学习方法，显著提升了长链式推理模型在数学和编码任务上的性能，并在AIME24和AIME25基准上超越了DeepSeek-R1和Qwen3-32B。
Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs

Published: 17 May, 2025 at 11:17 PM

88.54 🤔

本文提出上下文牵引（Contextual Entrainment）现象，揭示语言模型对提示中出现token的机制性偏好，并通过可微分掩码方法识别牵引头（entrainment heads），为理解和缓解分心问题提供了新视角。

Posts

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

LIFEBench: Evaluating Length Instruction Following in Large Language Models

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

Skywork Open Reasoner 1 Technical Report

Llama See, Llama Do: A Mechanistic Perspective on Contextual Entrainment and Distraction in LLMs