Tag: Reasoning

All the articles with the tag "Reasoning".

Pretraining Language Models to Ponder in Continuous Space

Published: 30 May, 2025 at 11:18 AM

86.39 🤔

本文提出Pondering Language Model，通过在预训练阶段引入自监督的连续空间深思机制，显著提升语言模型在语言建模和下游任务上的性能，PonderingPythia-1B接近TinyLlama-1.1B的效果。
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

Published: 4 Jun, 2025 at 11:27 AM

86.33 🤔

RLAE提出了一种通过强化学习动态调整大型语言模型集成权重的框架，将集成过程建模为马尔可夫决策过程，在多个任务上实现最高3.3%的性能提升，并展现出跨任务泛化能力和计算效率。
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games

Published: 24 May, 2025 at 11:11 AM

86.31 🤔

本文提出Divide-Fuse-Conquer框架，通过分组训练、参数融合和持续优化提升大型语言模型在多场景游戏中的泛化能力，实验在TextArena的18个游戏中显示Qwen2.5-32B-Align性能接近Claude3.5，但复杂场景表现仍有限。
Fractured Chain-of-Thought Reasoning

Published: 23 May, 2025 at 11:11 AM

86.28 🤔

本文提出Fractured Sampling方法，通过在推理轨迹数量、解决方案多样性和推理深度三个维度上进行采样优化，显著提升大型语言模型在长链式推理任务中的成本-性能权衡。
Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training

Published: 30 May, 2025 at 11:15 AM

86.28 🤔

本文通过微调GPT-4o和GPT-4o-mini，展示了大型语言模型能够量化报告其内部决策过程（如属性权重），并通过内省训练显著提升报告准确性，且这种能力可泛化至原生偏好，为AI可解释性和安全性提供了新路径。

Tag: Reasoning

Pretraining Language Models to Ponder in Continuous Space

RLAE: Reinforcement Learning-Assisted Ensemble for LLMs

Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games

Fractured Chain-of-Thought Reasoning

Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions, and Improve with Training