Posts
All the articles I've posted.
-   Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL本文提出 *AutoThink*,通过省略号提示和多阶段强化学习框架,使 R1 风格大型推理模型根据问题复杂性自适应地决定是否进行显式推理,在五个数学基准上实现了准确性和效率的优越权衡。 
-   Temporal Sampling for Forgotten Reasoning in LLMs本文揭示了大型语言模型微调中的'Temporal Forgetting'现象,并提出'Temporal Sampling'方法,通过从多个训练检查点采样答案显著提升推理性能(Pass@k提升4-19个百分点),并通过LoRA适配降低存储成本。 
-   Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding PerspectiveSentinel提出了一种轻量化的句子级别上下文压缩框架,通过探测0.5B代理模型的注意力信号实现高达5倍压缩率,并在LongBench基准上匹配7B规模系统的QA性能。 
-   From Words to Worlds: Compositionality for Cognitive Architectures本文通过设计三种任务评估大型语言模型(LLMs)的组合性能力,发现模型规模扩大通常提升组合性表现,而指令微调效果不一致,提示组合性对性能提升的解释力有限。 
-   Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMsThis paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.