Posts

All the articles I've posted.

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Published: 3 Jun, 2025 at 11:29 AM

85.16 🤔

本文通过调整初始化率和权重衰减系数控制大语言模型复杂性，显著提升推理能力，尤其在数学任务上表现突出，并在扩展律上展现更优性能。
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation

Published: 17 May, 2025 at 11:01 AM

85.16 🤔

This paper introduces Adaptive Difficulty Curriculum Learning (ADCL) and Expert-Guided Self-Reformulation (EGSR) to enhance LLM reasoning by dynamically adjusting training curricula and guiding models to reformulate expert solutions, achieving significant performance improvements over standard RL baselines on mathematical reasoning benchmarks.
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Published: 1 Jun, 2025 at 11:53 AM

85.15 🤔

ReMA通过多智能体强化学习分离元思考和推理过程，提升了大型语言模型在数学推理和LLM-as-a-Judge任务上的性能，尤其在分布外泛化能力上表现出色，但对超参数敏感且多轮设置存在稳定性挑战。
Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning

Published: 18 May, 2025 at 11:14 AM

85.14 🤔

本文提出Reasoning CPT方法，通过在持续预训练中加入合成隐藏思维数据，显著提升大型语言模型在跨领域推理、困难问题解决和推理效率方面的表现，特别是在MMLU基准上实现了最高3.3%的整体提升和困难问题上约8%的改进。
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Published: 18 May, 2025 at 11:16 AM

85.12 🤔

本文通过分析对齐前后LLM输出分布的变化，揭示了对齐虽减少分布性多元化但通过更长响应实现奥弗顿多元化，且基础模型通过上下文学习可有效模仿对齐模型行为，支持表面对齐假说。

Posts

Scalable Complexity Control Facilitates Reasoning Ability of LLMs

Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation

ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Mining Hidden Thoughts from Texts: Evaluating Continual Pretraining with Synthetic Data for LLM Reasoning

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment