Tag: Efficiency
All the articles with the tag "Efficiency".
-
Better Estimation of the KL Divergence Between Language Models
This paper introduces a Rao-Blackwellized Monte Carlo estimator for KL divergence between language models, achieving unbiased estimates with provably lower variance than standard Monte Carlo methods, and demonstrates improved stability and performance in RLHF fine-tuning for sentiment-controlled generation.
-
MOOSComp: Improving Lightweight Long-Context Compressor via Mitigating Over-Smoothing and Incorporating Outlier Scores
本文提出MOOSComp方法,通过在训练中添加inter-class cosine similarity loss缓解over-smoothing问题,并在压缩中整合outlier分数保留关键token,显著提升了任务无关的长上下文压缩性能和泛化能力。
-
CCSK:Cognitive Convection of Self-Knowledge Based Retrieval Augmentation for Large Language Models
本文提出CCSK框架,通过Siamese Network和Response Quality Model动态融合查询相似性和响应质量,优化大型语言模型的信息检索决策,在多个问答数据集上显著提升了F1分数和准确率。
-
LLM-Independent Adaptive RAG: Let the Question Speak for Itself
This paper introduces LLM-independent adaptive retrieval using 27 external information features across 7 groups, achieving comparable QA performance to LLM-based methods on 6 datasets while significantly improving efficiency by eliminating additional LLM calls during inference.
-
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
本文首次系统调查了大型语言模型高效推理的进展,通过分类模型、输出和提示-based方法,探讨了减少"过度思考"现象的策略,以优化计算效率并保持推理能力。