Posts
All the articles I've posted.
-
Pretraining Language Models to Ponder in Continuous Space
本文提出Pondering Language Model,通过在预训练阶段引入自监督的连续空间深思机制,显著提升语言模型在语言建模和下游任务上的性能,PonderingPythia-1B接近TinyLlama-1.1B的效果。
-
RLAE: Reinforcement Learning-Assisted Ensemble for LLMs
RLAE提出了一种通过强化学习动态调整大型语言模型集成权重的框架,将集成过程建模为马尔可夫决策过程,在多个任务上实现最高3.3%的性能提升,并展现出跨任务泛化能力和计算效率。
-
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning
本文提出Perturb-and-Merge (P&M)框架,通过训练时任务向量扰动和推理时模型凸组合合并,结合LoRA实现参数高效持续学习,在多个基准数据集上显著缓解灾难性遗忘并提升性能。
-
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
This paper demonstrates that fine-tuning large language models to 1.58-bit ternary weights using extra RMSNorm layers and a gradual quantization schedule achieves superior cross-entropy loss and preserves reasoning performance, enabling deployment on commodity hardware without relying on complex knowledge distillation.
-
Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games
本文提出Divide-Fuse-Conquer框架,通过分组训练、参数融合和持续优化提升大型语言模型在多场景游戏中的泛化能力,实验在TextArena的18个游戏中显示Qwen2.5-32B-Align性能接近Claude3.5,但复杂场景表现仍有限。