Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
本文提出 *AutoThink*,通过省略号提示和多阶段强化学习框架,使 R1 风格大型推理模型根据问题复杂性自适应地决定是否进行显式推理,在五个数学基准上实现了准确性和效率的优越权衡。
-
Temporal Sampling for Forgotten Reasoning in LLMs
本文揭示了大型语言模型微调中的'Temporal Forgetting'现象,并提出'Temporal Sampling'方法,通过从多个训练检查点采样答案显著提升推理性能(Pass@k提升4-19个百分点),并通过LoRA适配降低存储成本。
-
Sentinel: Attention Probing of Proxy Models for LLM Context Compression with an Understanding Perspective
Sentinel提出了一种轻量化的句子级别上下文压缩框架,通过探测0.5B代理模型的注意力信号实现高达5倍压缩率,并在LongBench基准上匹配7B规模系统的QA性能。
-
From Words to Worlds: Compositionality for Cognitive Architectures
本文通过设计三种任务评估大型语言模型(LLMs)的组合性能力,发现模型规模扩大通常提升组合性表现,而指令微调效果不一致,提示组合性对性能提升的解释力有限。
-
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
This paper introduces Learning to Think (L2T), an information-theoretic reinforcement fine-tuning framework for LLMs that uses a universal dense process reward to optimize reasoning effectiveness and efficiency, achieving significant accuracy and token efficiency gains on math reasoning benchmarks.