Tag: Large Language Model
All the articles with the tag "Large Language Model".
-
Better Estimation of the KL Divergence Between Language Models
This paper introduces a Rao-Blackwellized Monte Carlo estimator for KL divergence between language models, achieving unbiased estimates with provably lower variance than standard Monte Carlo methods, and demonstrates improved stability and performance in RLHF fine-tuning for sentiment-controlled generation.
-
Toward Efficient Exploration by Large Language Model Agents
本文通过使用 LLMs 显式实现后验采样 RL 算法,显著提高了 LLMs 代理在自然语言环境中的探索效率,同时保留了经典算法的统计性能优势。
-
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards
REWARD-SQL introduces a framework for Text-to-SQL by decomposing queries into Chain-of-CTEs and using Process Reward Models (PRMs) with GRPO and Best-of-N sampling, achieving a state-of-the-art 68.9% execution accuracy on the BIRD dataset with a 7B model.
-
HSI: Head-Specific Intervention Can Induce Misaligned AI Coordination in Large Language Models
本文提出Head-Specific Intervention (HSI)方法,通过针对特定注意力头的激活干预,成功诱导Llama 2模型在AI协调行为上绕过安全对齐,效果优于监督微调和其它干预策略。
-
RWKVQuant: Quantizing the RWKV Family with Proxy Guided Hybrid of Scalar and Vector Quantization
RWKVQuant introduces a tailored Post Training Quantization framework for RWKV models, using a coarse-to-fine proxy to hybridize scalar and vector quantization and optimizing codebooks for element-wise operations, achieving ~3-bit quantization with minimal accuracy loss and significant memory and speed improvements.