Posts

All the articles I've posted.

Better Estimation of the KL Divergence Between Language Models

Published: 12 May, 2025 at 11:21 AM

71.02 🤔

This paper introduces a Rao-Blackwellized Monte Carlo estimator for KL divergence between language models, achieving unbiased estimates with provably lower variance than standard Monte Carlo methods, and demonstrates improved stability and performance in RLHF fine-tuning for sentiment-controlled generation.
CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models

Published: 7 May, 2025 at 08:43 AM

70.69 🤔

本文提出CCSK框架，通过Siamese Network和Response Quality Model动态融合查询相似性和响应质量，优化大型语言模型的信息检索决策，在多个问答数据集上显著提升了F1分数和准确率。
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

Published: 4 May, 2025 at 04:30 PM

70.65 🤔

本文提出MegaScale-Infer系统，通过分离注意力模块和FFN模块的并行策略以及高效M2N通信库，优化大规模MoE模型的推理效率，实现高达1.90倍的吞吐量提升。
LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Published: 13 May, 2025 at 11:09 AM

70.54 🤔

This paper introduces LLM-independent adaptive retrieval using 27 external information features across 7 groups, achieving comparable QA performance to LLM-based methods on 6 datasets while significantly improving efficiency by eliminating additional LLM calls during inference.
Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

Published: 12 May, 2025 at 11:15 AM

70.53 🤔

REWARD-SQL introduces a framework for Text-to-SQL by decomposing queries into Chain-of-CTEs and using Process Reward Models (PRMs) with GRPO and Best-of-N sampling, achieving a state-of-the-art 68.9% execution accuracy on the BIRD dataset with a 7B model.

Posts

Better Estimation of the KL Divergence Between Language Models

CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models

MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards