Tag: Instruction Tuning

All the articles with the tag "Instruction Tuning".

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant

Published: 11 May, 2025 at 11:12 AM

85.28 🤔

This paper comprehensively evaluates the impact of four quantization methods (GPTQ, AWQ, SmoothQuant, FP8) on instruction-tuned LLMs and SLMs from 1B to 405B parameters across 13 datasets, revealing that quantized models often outperform smaller baselines but struggle with instruction-following and hallucination detection, with FP8 showing robustness and task difficulty not always correlating with accuracy loss.
Reverse Preference Optimization for Complex Instruction Following

Published: 1 Jun, 2025 at 11:44 AM

85.20 🤔

本文提出逆向偏好优化（RPO）方法，通过动态反转指令中未满足的约束消除偏好对噪声，在多轮复杂指令跟随任务上显著优于DPO基线，并在70B模型上超越GPT-4o。
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective

Published: 5 Jun, 2025 at 11:23 AM

85.08 🤔

本文提出'Trajectory Policy Gradient Theorem'，从理论上证明在LLM在线强化学习中仅用响应级别奖励即可无偏估计token级奖励的策略梯度，并基于此设计了TRePO算法，简化PPO设计并具备token级建模能力。
Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization

Published: 6 May, 2025 at 11:21 PM

85.04 🤔

本文通过控制实验、内部机制分析和理论推导，揭示了显式思维链（CoT）训练通过形成二阶段泛化电路显著提升大型语言模型的分布内（ID）和分布外（OOD）推理泛化能力，并验证了其在噪声数据下的鲁棒性。
What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction

Published: 7 May, 2025 at 08:42 AM

83.92 🤔

本文通过理论分析区分了语言模型输出概率的三种解释（完成分布、响应分布、事件分布），揭示了现有研究中对这些分布的混淆和误解，并呼吁谨慎解释模型概率以指导LLM的开发和应用。

Tag: Instruction Tuning

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant

Reverse Preference Optimization for Complex Instruction Following

Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective

Unveiling the Mechanisms of Explicit CoT Training: How CoT Enhances Reasoning Generalization

What do Language Model Probabilities Represent? From Distribution Estimation to Response Prediction