Tag: Test Time

All the articles with the tag "Test Time".

Scaling Reasoning can Improve Factuality in Large Language Models

Published: 20 May, 2025 at 11:09 AM

87.44 🤔

本文通过从先进模型中提取并用知识图谱增强推理轨迹，微调Qwen2.5系列模型，并在复杂开放域问答任务中验证了测试时计算扩展（并行采样和预算强制）可提升事实准确性2-8%，尤其对小型模型效果显著。
SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

Published: 21 May, 2025 at 11:23 AM

87.10 🤔

SoftCoT++ 通过在连续潜在空间中引入多样化初始令牌和对比学习实现测试时扩展，显著提升了大型语言模型在多个推理任务上的性能，并与传统离散空间扩展方法展现出协同效应。
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Published: 10 May, 2025 at 10:59 AM

86.42 🤔

This paper introduces ModelSwitch, a multi-LLM repeated sampling strategy that leverages answer consistency to dynamically switch models, achieving superior performance and 34% sample efficiency over single-LLM self-consistency across diverse datasets.
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Published: 21 May, 2025 at 11:29 AM

86.09 🤔

本文提出两种测试时计算扩展算法（淘汰赛式和联赛式），通过生成多个候选解决方案并进行成对比较，在理论上证明其失败概率随计算资源增加呈指数或幂律下降，并在多个数据集和模型上验证了性能提升。
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Published: 20 May, 2025 at 11:10 AM

85.99 🤔

本文提出 S-GRPO 方法，通过串行组生成和递减奖励策略调控大型语言模型中间推理过程，在多个基准数据集上实现推理长度减少 35.4%~61.1% 和准确率提升 0.72%~6.08%，显著提升推理效率。

Tag: Test Time

Scaling Reasoning can Improve Factuality in Large Language Models

SoftCoT++: Test-Time Scaling with Soft Chain-of-Thought Reasoning

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models