Posts
All the articles I've posted.
-
Scalable Complexity Control Facilitates Reasoning Ability of LLMs
本文通过调整初始化率和权重衰减系数控制大语言模型复杂性,显著提升推理能力,尤其在数学任务上表现突出,并在扩展律上展现更优性能。
-
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
This paper introduces Adaptive Difficulty Curriculum Learning (ADCL) and Expert-Guided Self-Reformulation (EGSR) to enhance LLM reasoning by dynamically adjusting training curricula and guiding models to reformulate expert solutions, achieving significant performance improvements over standard RL baselines on mathematical reasoning benchmarks.
-
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
ReMA通过多智能体强化学习分离元思考和推理过程,提升了大型语言模型在数学推理和LLM-as-a-Judge任务上的性能,尤其在分布外泛化能力上表现出色,但对超参数敏感且多轮设置存在稳定性挑战。
-
本文提出Reasoning CPT方法,通过在持续预训练中加入合成隐藏思维数据,显著提升大型语言模型在跨领域推理、困难问题解决和推理效率方面的表现,特别是在MMLU基准上实现了最高3.3%的整体提升和困难问题上约8%的改进。
-
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
本文通过分析对齐前后LLM输出分布的变化,揭示了对齐虽减少分布性多元化但通过更长响应实现奥弗顿多元化,且基础模型通过上下文学习可有效模仿对齐模型行为,支持表面对齐假说。