Posts

All the articles I've posted.

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Published: 20 May, 2025 at 11:10 AM

92.09 🤔

本文提出 *AutoThink*，通过省略号提示和多阶段强化学习框架，使 R1 风格大型推理模型根据问题复杂性自适应地决定是否进行显式推理，在五个数学基准上实现了准确性和效率的优越权衡。
Temporal Sampling for Forgotten Reasoning in LLMs

Published: 28 May, 2025 at 11:20 AM

92.01 🤔

本文揭示了大型语言模型微调中的'Temporal Forgetting'现象，并提出'Temporal Sampling'方法，通过从多个训练检查点采样答案显著提升推理性能（Pass@k提升4-19个百分点），并通过LoRA适配降低存储成本。
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

Published: 2 Jun, 2025 at 11:24 AM

91.96 🤔

Sentinel提出了一种轻量化的句子级别上下文压缩框架，通过探测0.5B代理模型的注意力信号实现高达5倍压缩率，并在LongBench基准上匹配7B规模系统的QA性能。
From Words to Worlds: Compositionality for Cognitive Architectures

Published: 25 May, 2025 at 11:24 AM

91.89 🤔

本文通过设计三种任务评估大型语言模型（LLMs）的组合性能力，发现模型规模扩大通常提升组合性表现，而指令微调效果不一致，提示组合性对性能提升的解释力有限。
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Published: 17 May, 2025 at 11:02 AM

91.74 🤔

This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.

Posts

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

Temporal Sampling for Forgotten Reasoning in LLMs

Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective

From Words to Worlds: Compositionality for Cognitive Architectures

Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs