This repository aims to provide a comprehensive overview of Retrieval-augmented Generation (RAG) by curating highly related resources, including representative papers, workshops, tutorials, evaluation tracts and open-source projects. Given the rapid evolution of this field, we will continue to update the repository on a regular basis.
- Survey Papers
- Workshops & Tutorials
- Evaluation Campaigns
- Open-source Projects
- Papers
-
⭐ Retrieval-Enhanced Machine Learning: Synthesis and Opportunities (link, 2024)
- Highlight: This is the survey manuscript used in the tutorials (Retrieval-Enhanced Machine Learning) at SIGIR 2025 and SIGIR-AP 2024. This tutorial introduces core REML concepts and synthesizing the literature from various domains in machine learning (ML), including but beyond NLP. What is unique to our approach is that we used consistent notations, to provide researchers with a unified and expandable framework.
-
A Survey on RAG Meeting LLMs Towards Retrieval-Augmented Large Language Models (paper, RAG-Meets-LLMs Tutorial at KDD'24)
- Highlight: This tutorial comprehensively reviews existing research studies in retrieval-augmented large language models (RA-LLMs), covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, the authors briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, the authors categorize mainstream relevant work by application areas, detailing the challenges of each and the corresponding capabilities of RA-LLMs specifically. Finally, to deliver deeper insights, the authors discuss current limitations and several promising directions for future research. Meanwhile, the accompanying survey paper and slides are publicly available.
-
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions (paper, ACM Transactions on Information Systems-2025)
- Highlight: The discussion includes representative methodologies for mitigating LLM hallucinations. Additionally, the authors delve into the current limitations faced by retrieval-augmented LLMs in combating hallucinations, offering insights for developing more robust IR systems.
-
A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions (paper, 2024)
- Highlight: The study explores the basic architecture of RAG, focusing on how retrieval and generation are integrated to handle knowledge-intensive tasks. A detailed review of the significant technological advancements in RAG is provided, including key innovations in retrieval-augmented language models and applications across various domains such as question-answering, summarization, and knowledge-based tasks. Recent research breakthroughs are discussed, highlighting novel methods for improving retrieval efficiency. Furthermore, the paper examines ongoing challenges such as scalability, bias, and ethical concerns in deployment.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey (paper, 2024)
- Highlight: The authors first classify RAG foundations according to how the retriever augments the generator. They distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. The authors also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, the authors survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, the authors introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research.
-
Information Retrieval for Artificial General Intelligence: A New Perspective on Information Retrieval Research (Perspective paper@SIGIR-2025)
- Highlight: The author presents a new perspective on IR research in which the users of an IR system are intelligent agents instead of human users. Extending the current work on retrieval-augmented generation (RAG), the author identifies five novel IR tasks that an intelligent agent must be able to perform in order to achieve Human-Level Artificial Intelligence, or Artificial General Intelligence (AGI), including 1) External Information Retrieval (EIR) to access new information unseen by the agent, 2) Provenance Information Retrieval (PIR) to trace the provenance of information, 3) Curriculum Information Retrieval (CIR) to actively acquire the most useful new data and information for lifelong learning, 4) Rule Information Retrieval (RIR) to perform reasoning and problem solving, and 5) Scenario Information Retrieval (SIR) to leverage past scenarios for problem solving and decision making.
-
Synergizing RAG and Reasoning: A Systematic Review (paper, 2025)
- Highlight: This survey paper presents a systematic review of the collaborative interplay between RAG and reasoning, clearly defining "reasoning" within the RAG context. It construct a comprehensive taxonomy encompassing multi-dimensional collaborative objectives, representative paradigms, and technical implementations, and analyze the bidirectional synergy methods.
-
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely (paper, 2024)
- Highlight: This survey paper presents a systematic review of RAG by novely classifying user queries into four levels based on the type of external data required and the task’s primary focus: explicit fact queries, implicit fact queries, interpretable rationale queries, and hidden rationale queries.
-
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey (paper, 2024)
- Highlight: In this survey, the authors propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, the authors thoroughly review the existing literature on each dimension. Additionally, the authors create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, the authors identify the potential challenges for future research based on our investigation results. Through this work, the authors aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.
-
TREC RAG Track (site, 2024, 2025)
- Highlight: The TREC RAG Track is an ongoing, multi-year effort. It aims to bring the research community together around a unified benchmark to evaluate the end-to-end performance of systems that combine retrieval and generation. By structuring participation through distinct but complementary tasks, the track enabled deeper analysis of individual system components and their interactions. In fact, the following TREC tracks (BioGEN 2025 Track, DRAGUN 2025 Track, IKAT 2025 Track, RAGTIME 2025 Track) also support RAG-based approaches, please refer to the corresponding websites for more information.
-
SIGIR 2025 LiveRAG Challenge track (site)
- Highlight: The goal of the SIGIR'2025 LiveRAG Challenge is to allow research teams across academia and industry to advance their RAG research and compare the performance of their solutions with other teams, on a fixed corpus (derived from the publicly available FineWeb) and a fixed open-source LLM, Falcon3-10B-Instruct.
-
Meta Comprehensive RAG Benchmark: KDD Cup 2024 (site)
- Highlight: The Meta Comprehensive RAG Challenge (CRAG) aims to provide a good benchmark with clear metrics and evaluation protocols, to enable rigorous assessment of the RAG systems, drive innovations, and advance the solutions.
-
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research (paper, site)
- Highlight: FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes about 36 pre-processed benchmark RAG datasets and around 17 state-of-the-art RAG algorithms.
-
LightRAG: Simple and Fast Retrieval-Augmented Generation (paper, site)
- Highlight: Focusing on the effective combination of LLM-generated knowledge graphs and graph machine learning, other closely related work includes GraphRAG by Microsoft Research.
-
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation (paper, site)
- Highlight: A library designed to benchmark RAG systems with a focus on question-answering (QA). It addresses the challenge of inconsistent benchmarking in comparing approaches and understanding the impact of each component in a RAG pipeline.
-
Retrieval-Enhanced Machine Learning (site)
- Highlight: This is an ongoing, multi-year effort that now includes tutorials at SIGIR 2025 and SIGIR-AP 2024, as well as a workshop held in conjunction with SIGIR 2023. Meanwhile, the survey manuscript titled “Retrieval-Enhanced Machine Learning: Synthesis and Opportunities” (detailed in Survey Papers) and the accompanying slides (e.g., REML@SIGIR-2025 slides) are publicly available.
-
⭐ Tutorial: Retrieval-based Language Models and Applications
- Highlight: This tutorial at ACL-2023 (site and paper) offers a detailed survey, The slides and reference papers are also available on the website. The subsequent position paper by Akari Asai et al. (Reliable, Adaptable, and Attributable Language Models with Retrieval) is also highly insightful and thought-provoking.
-
R3AG: Workshop on Refined and Reliable Retrieval Augmented Generation (site)
-
BREV-RAG (Beyond Relevance-based EValuation of RAG systems)
- Highlight: The workshop of BREV-RAG@SIGIR-AP 2025 (calling for papers now) focuses on the viewpoint of evaluation, which will be held in December, 2025.
-
- [CLL24] Unified Active Retrieval for Retrieval-Augmented Generation. EMNLP Findings.
- [JBC24] Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity. NAACL.
- [JXG23] Active Retrieval-Augmented Generation. EMNLP.
- [STA24] DRAGIN: Dynamic Retrieval-Augmented Generation Based on the Real-Time Information Needs of Large Language Models. ACL.
- [AWW24] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. ICLR.
- [CQL25] PAIRS: Parametric-Verified Adaptive Information Retrieval and Selection for Efficient RAG. arXiv.
- [CLL24] Unified Active Retrieval for Retrieval-Augmented Generation. EMNLP Findings.
-
- [PYZ23] Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models. ICLR.
- [SKZ24] Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation. SIGIR.
- [KD24] LTRR: Learning To Rank Retrievers for LLMs. SIGIR-2025-LiveRAG.
- [GKP25] Efficient Federated Search for Retrieval-Augmented Generation. EuroMLSys.
- [WWL25] MultiRAG: A Knowledge-guided Framework for Mitigating Hallucination in Multi-source Retrieval Augmented Generation. ICDE.
-
- [NZC19] Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering. NAACL.
- [BMH22] Improving Language Models by Retrieving from Trillions of Tokens (RETRO). ICML.
- [XLI21] Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval. ICLR.
- [LLL22] Query Expansion Using Contextual Clue Sampling with Language Models. arXiv.
- [MHL21] Generation-Augmented Retrieval for Open-Domain Question Answering. ACL.
- [GPT22] Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering. CVPR.
- [PLY20] Unsupervised Question Decomposition for Question Answering. EMNLP.
- [ZRY22] Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts. EMNLP.
- [CZC25] When and How to Augment Your Input: Question Routing Helps Balance the Accuracy and Efficiency of Large Language Models. NAACL Findings.
- [ZLH25] Query Routing for Retrieval-Augmented Language Models. arXiv.
-
- Sparse Retrieval
- [FLP22] From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective. SIGIR.
- [MKT21] Learning Passage Impacts for Inverted Indexes. SIGIR.
- [GDC21] COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List. NAACL.
- [LM21] A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques. arXiv.
- Dense Retrieval
- [KOM20] Dense Passage Retrieval for Open-Domain Question Answering. EMNLP.
- [RKH21] Learning Transferable Visual Models From Natural Language Supervision. ICML.
- [LLX22] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML.
- [LLS23] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML.
- [XSA25] BLIP-3: A Family of Open Large Multimodal Models. arXiv.
- [KZ20] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR.
- [MT24] A Reproducibility Study of PLAID. SIGIR.
- [DHJ24] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encoding. NeurIPS.
- [XXL21] Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. ICLR.
- [KZ20] ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR.
- [MTO21] On Single and Multiple Representations in Dense Passage Retrieval. IIR.
- Re-ranking, especially with LLMs
- [NC20] Passage Re-ranking with BERT. arXiv.
- [SZ25] Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization. ICTIR.
- [SZ24] Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models. SIGIR.
- [SYM23] Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. EMNLP.
- [SYX25] BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval. ICLR.
- [YYR25] Rank-K: Test-Time Reasoning for Listwise Reranking. arXiv
- [ZMK25] Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning. arXiv
- [SPS25] RankLLM: A Python Package for Reranking with LLMs. SIGIR.
- [ZZK24] A Setwise Approach for Effective and Highly Efficient Zero‑shot Ranking with Large Language Models. SIGIR.
- Generative Retrieval
- Sparse Retrieval
-
-
Getting familiar with the evolution of model architectures in information retrieval (IR) and the representative retrieval methods.
- [ZMH25] A Survey of Model Architectures in Information Retrieval. arXiv.
-
For the up-to-date overview on sparse retrieval
-
For a detailed overview on dense retrieval
-
⭐ For a detailed overview on Retrieval and Ranking with LLMs
- Highlight: Generative LLMs like GPT, Gemini, and Llama are transforming Information Retrieval, enabling new and more effective approaches to document retrieval and ranking. The switch from the previous generation pre-trained language models backbones (e.g., BERT, T5) to the new generative LLMs backbones has required the field to adapt training processes; it also has provided unprecedented capabilities and opportunities, stimulating research into zero-shot approaches, reasoning approaches, reinforcement learning based training, and multilingual and multimodal applications. This tutorial at SIGIR-2025 (site and slides) provides a systematic overview of LLM-based retrievers and rankers, covering fundamental architectures, training paradigms, real-world deployment considerations, and open challenges and research directions. At the end of the tutorial, a number of helpful tools for research on LLM-retrievers are listed.
-
-
- [SMY24] REPLUG: Retrieval-Augmented Black-Box Language Models. NAACL.
-
- [YGZ24] Evaluation of Retrieval-Augmented Generation: A Survey. arXiv.
- [D24] A Workbench for Autograding Retrieve/Generate Systems. SIGIR (best paper award).
- [SZ24] Evaluating Retrieval Quality in Retrieval-Augmented Generation. SIGIR (best short paper award).
- [FKP24] ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. NAACL.
-
- [LJX25] A Survey of Personalization: From RAG to Agent. arXiv.
-
- [AAH25] Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models. ACL.
- [LHS24] TruthReader: Towards Trustworthy Document Assistant Chatbot with Reliable Attribution. EMNLP.
- [BFW24] CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity. EMNLP.
- [MTM22] Teaching Language Models to Support Answers with Verified Quotes. arXiv.
- [XWL25] Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation. NAACL.
- [XQC25] CiteEval: Principle-Driven Citation Evaluation for Source Attribution. ACL.