Tag: Efficiency
All the articles with the tag "Efficiency".
-
An Extra RMSNorm is All You Need for Fine Tuning to 1.58 Bits
This paper demonstrates that fine-tuning large language models to 1.58-bit ternary weights using extra RMSNorm layers and a gradual quantization schedule achieves superior cross-entropy loss and preserves reasoning performance, enabling deployment on commodity hardware without relying on complex knowledge distillation.
-
RepCali: High Efficient Fine-tuning Via Representation Calibration in Latent Space for Pre-trained Language Models
本文提出了一种名为RepCali的微调方法,通过在潜在空间中校准预训练语言模型编码器输出,显著提升了25个模型在8个下游任务上的性能,同时仅增加0-0.8%的参数。
-
Fractured Chain-of-Thought Reasoning
本文提出Fractured Sampling方法,通过在推理轨迹数量、解决方案多样性和推理深度三个维度上进行采样优化,显著提升大型语言模型在长链式推理任务中的成本-性能权衡。
-
Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data?
本文提出并验证了'浅层偏好信号'现象,通过截断偏好数据集(保留前40%-50% token)训练奖励模型和DPO模型,性能与完整数据集相当甚至更优,并揭示了当前对齐方法过于关注早期token的局限性。
-
Learn to Reason Efficiently with Adaptive Length-based Reward Shaping
本文通过提出基于强化学习的LASER系列方法(LASER, LASER-D, LASER-DE),利用动态和难度感知的长度奖励塑造,在保持大型推理模型性能的同时显著提高token效率,在多个数学推理基准上实现了Pareto最优的准确率和效率权衡。