Posts

All the articles I've posted.

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Published: 8 May, 2025 at 06:17 PM

86.55 🤔

This paper investigates zero RL training on diverse open base models, achieving significant accuracy and response length improvements while identifying key factors like reward design and data difficulty that influence the emergence of reasoning behaviors.
Putting It All into Context: Simplifying Agents with LCLMs

Published: 19 May, 2025 at 11:19 AM

86.55 🤔

本文提出基于长上下文语言模型（LCLM）的‘state-in-context’代理设计，通过将整个环境状态纳入上下文简化软件工程任务的代理架构，在SWE-bench Verified上实现与复杂脚手架方法相当的性能（Gemini-2.5-Pro达到50.8% pass@1）。
Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning

Published: 31 May, 2025 at 11:19 AM

86.54 🤔

本文通过隐藏状态的几何特性（可分离性和对齐性）提出统一框架，揭示上下文学习（ICL）在分类任务中的两阶段机制——早期层通过PTH增强可分离性，后期层通过IH优化对齐性，并解释了任务向量的有效性。
Understanding Cross-Lingual Inconsistency in Large Language Models

Published: 26 May, 2025 at 11:22 AM

86.50 🤔

本文通过*logit lens*分析大型语言模型（LLMs）的跨语言不一致性，发现大型模型倾向于在个别语言子空间操作而非共享语义空间，并提出跨语言激活引导方法以提升小型模型的多语言推理性能和知识转移。
Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study

Published: 7 May, 2025 at 08:41 AM

86.49 🤔

本文通过探索离线强化学习方法（LD-DPO），在DeepDistill-32B模型上实现了平均3.3%的推理性能提升，尤其在Arena-Hard基准上提升10.1%，并强调了推理长度与语义丰富性平衡的重要性。

Posts

SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild

Putting It All into Context: Simplifying Agents with LCLMs

Unifying Attention Heads and Task Vectors via Hidden State Geometry in In-Context Learning

Understanding Cross-Lingual Inconsistency in Large Language Models

Exploring the Potential of Offline RL for Reasoning in LLMs: A Preliminary Study