LLM-Independent Adaptive RAG: Let the Question Speak for Itself

This paper introduces LLM-independent adaptive retrieval using 27 external information features across 7 groups, achieving comparable QA performance to LLM-based methods on 6 datasets while significantly improving efficiency by eliminating additional LLM calls during inference.

Retrieval-Augmented Generation, Efficiency, Question Answering, External Information, Adaptive Systems

Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii

Skoltech, AIRI, HSE University, MTS AI, MIPT

Generated by grok-3

Background Problem

Large Language Models (LLMs) often suffer from hallucinations in question answering (QA) tasks, which Retrieval-Augmented Generation (RAG) mitigates by incorporating external information, albeit at high computational costs and with risks of misinformation. Adaptive retrieval aims to balance efficiency and accuracy by retrieving information only when necessary, but existing methods rely on LLM-based uncertainty estimation, introducing significant computational overhead. This paper addresses the inefficiency of such approaches by proposing LLM-independent adaptive retrieval methods using external information to decide when retrieval is needed, aiming to maintain QA performance while reducing computational costs.

Method

The core idea is to leverage external information features for adaptive retrieval, eliminating the need for LLM-based uncertainty estimation. The authors introduce 27 features across 7 groups: Graph (entity relationships in knowledge graphs), Popularity (Wikipedia page views of entities), Frequency (entity occurrence in reference texts), Knowledgability (pre-computed scores of LLM uncertainty about entities), Question Type (categorization into types like ordinal or multihop), Question Complexity (reasoning steps required), and Context Relevance (probability of context relevance to the question). These features are used to train lightweight classifiers to predict whether retrieval is necessary for a given question. Hybrid combinations of these features are also explored to assess their combined effectiveness. The implementation uses pre-computed features to avoid real-time LLM calls, ensuring efficiency during inference, and is tested with LLaMA 3.1-8B-Instruct and BM25 retriever as the base components.

Experiment

The experiments were conducted on 6 QA datasets (Natural Questions, SQuAD v1.1, TriviaQA, HotpotQA, 2WikiMultiHopQA, and MuSiQue) using 500-question subsets to evaluate both single-hop and multi-hop query complexities. The setup compares the proposed external features against baselines like ‘Always RAG’, ‘Never RAG’, and LLM-based adaptive retrieval methods (e.g., FLARE, DRAGIN) using metrics like In-Accuracy (QA performance), Retrieval Calls (RC), and LM Calls (LMC) for efficiency. The design aims to assess whether external features can replace or complement uncertainty-based methods while improving efficiency. Results show that external features match or slightly outperform uncertainty-based methods in QA performance (e.g., achieving up to 49.8% In-Accuracy on Natural Questions with Popularity features compared to 51.2% with EigValLaplacian), particularly for complex questions like in MuSiQue (12.2% In-Accuracy with HybridExternal). Efficiency gains are significant, with LMC often at 1.0 (no additional LLM calls) compared to up to 29.5 for methods like RowenCM, though RC slightly increases (up to 1.0 vs. 0.58 for some uncertainty methods). The results partially meet expectations of maintaining performance while boosting efficiency, but the conservative retrieval behavior (higher RC) and lack of complementarity with uncertainty features suggest limitations in fully replacing existing methods.

Further Thoughts

The focus on external information for adaptive retrieval opens up interesting avenues for reducing dependency on computationally expensive LLM calls, which could be particularly impactful in resource-constrained environments like edge devices. However, the strong correlation between external features and uncertainty-based features, as highlighted in the paper’s heatmap analysis, suggests potential redundancy rather than a groundbreaking shift in methodology. This raises a question: could a more selective subset of external features, perhaps combined with minimal uncertainty estimation, achieve a better balance of efficiency and accuracy? Additionally, the pre-computation of features like ‘Knowledgability’ might not scale well with rapidly evolving knowledge bases or LLMs, pointing to a need for dynamic updating mechanisms. Relating this to broader AI trends, the approach aligns with efforts in federated learning to minimize central computation, but it could benefit from exploring cross-domain applications, such as in AI for Science, where domain-specific external features (e.g., citation networks) might enhance retrieval decisions. Future work could also investigate how these external features interact with emergent abilities in larger foundation models, potentially uncovering new efficiency-accuracy trade-offs.