Skip to content

ii-research/RAG_Overview

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 

Repository files navigation

RAG_Overview

This repository aims to provide a comprehensive overview of Retrieval-augmented Generation (RAG) by curating highly related resources, including representative papers, workshops, tutorials, evaluation tracts and open-source projects. Given the rapid evolution of this field, we will continue to update the repository on a regular basis.

  • ⭐ Retrieval-Enhanced Machine Learning: Synthesis and Opportunities (link, 2024)

    • Highlight: This is the survey manuscript used in the tutorials (Retrieval-Enhanced Machine Learning) at SIGIR 2025 and SIGIR-AP 2024. This tutorial introduces core REML concepts and synthesizing the literature from various domains in machine learning (ML), including but beyond NLP. What is unique to our approach is that we used consistent notations, to provide researchers with a unified and expandable framework.
  • A Survey on RAG Meeting LLMs Towards Retrieval-Augmented Large Language Models (paper, RAG-Meets-LLMs Tutorial at KDD'24)

    • Highlight: This tutorial comprehensively reviews existing research studies in retrieval-augmented large language models (RA-LLMs), covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, the authors briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, the authors categorize mainstream relevant work by application areas, detailing the challenges of each and the corresponding capabilities of RA-LLMs specifically. Finally, to deliver deeper insights, the authors discuss current limitations and several promising directions for future research. Meanwhile, the accompanying survey paper and slides are publicly available.
  • A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions (paper, ACM Transactions on Information Systems-2025)

    • Highlight: The discussion includes representative methodologies for mitigating LLM hallucinations. Additionally, the authors delve into the current limitations faced by retrieval-augmented LLMs in combating hallucinations, offering insights for developing more robust IR systems.
  • A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions (paper, 2024)

    • Highlight: The study explores the basic architecture of RAG, focusing on how retrieval and generation are integrated to handle knowledge-intensive tasks. A detailed review of the significant technological advancements in RAG is provided, including key innovations in retrieval-augmented language models and applications across various domains such as question-answering, summarization, and knowledge-based tasks. Recent research breakthroughs are discussed, highlighting novel methods for improving retrieval efficiency. Furthermore, the paper examines ongoing challenges such as scalability, bias, and ethical concerns in deployment.
  • Retrieval-Augmented Generation for AI-Generated Content: A Survey (paper, 2024)

    • Highlight: The authors first classify RAG foundations according to how the retriever augments the generator. They distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. The authors also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, the authors survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, the authors introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research.
  • Information Retrieval for Artificial General Intelligence: A New Perspective on Information Retrieval Research (Perspective paper@SIGIR-2025)

    • Highlight: The author presents a new perspective on IR research in which the users of an IR system are intelligent agents instead of human users. Extending the current work on retrieval-augmented generation (RAG), the author identifies five novel IR tasks that an intelligent agent must be able to perform in order to achieve Human-Level Artificial Intelligence, or Artificial General Intelligence (AGI), including 1) External Information Retrieval (EIR) to access new information unseen by the agent, 2) Provenance Information Retrieval (PIR) to trace the provenance of information, 3) Curriculum Information Retrieval (CIR) to actively acquire the most useful new data and information for lifelong learning, 4) Rule Information Retrieval (RIR) to perform reasoning and problem solving, and 5) Scenario Information Retrieval (SIR) to leverage past scenarios for problem solving and decision making.
  • Synergizing RAG and Reasoning: A Systematic Review (paper, 2025)

    • Highlight: This survey paper presents a systematic review of the collaborative interplay between RAG and reasoning, clearly defining "reasoning" within the RAG context. It construct a comprehensive taxonomy encompassing multi-dimensional collaborative objectives, representative paradigms, and technical implementations, and analyze the bidirectional synergy methods.
  • Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely (paper, 2024)

    • Highlight: This survey paper presents a systematic review of RAG by novely classifying user queries into four levels based on the type of external data required and the task’s primary focus: explicit fact queries, implicit fact queries, interpretable rationale queries, and hidden rationale queries.
  • Trustworthiness in Retrieval-Augmented Generation Systems: A Survey (paper, 2024)

    • Highlight: In this survey, the authors propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, the authors thoroughly review the existing literature on each dimension. Additionally, the authors create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, the authors identify the potential challenges for future research based on our investigation results. Through this work, the authors aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.
  • TREC RAG Track (site, 2024, 2025)

    • Highlight: The TREC RAG Track is an ongoing, multi-year effort. It aims to bring the research community together around a unified benchmark to evaluate the end-to-end performance of systems that combine retrieval and generation. By structuring participation through distinct but complementary tasks, the track enabled deeper analysis of individual system components and their interactions. In fact, the following TREC tracks (BioGEN 2025 Track, DRAGUN 2025 Track, IKAT 2025 Track, RAGTIME 2025 Track) also support RAG-based approaches, please refer to the corresponding websites for more information.
  • SIGIR 2025 LiveRAG Challenge track (site)

    • Highlight: The goal of the SIGIR'2025 LiveRAG Challenge is to allow research teams across academia and industry to advance their RAG research and compare the performance of their solutions with other teams, on a fixed corpus (derived from the publicly available FineWeb) and a fixed open-source LLM, Falcon3-10B-Instruct.
  • Meta Comprehensive RAG Benchmark: KDD Cup 2024 (site)

    • Highlight: The Meta Comprehensive RAG Challenge (CRAG) aims to provide a good benchmark with clear metrics and evaluation protocols, to enable rigorous assessment of the RAG systems, drive innovations, and advance the solutions.
  • FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research (paper, site)

    • Highlight: FlashRAG is a Python toolkit for the reproduction and development of Retrieval Augmented Generation (RAG) research. Our toolkit includes about 36 pre-processed benchmark RAG datasets and around 17 state-of-the-art RAG algorithms.
  • LightRAG: Simple and Fast Retrieval-Augmented Generation (paper, site)

    • Highlight: Focusing on the effective combination of LLM-generated knowledge graphs and graph machine learning, other closely related work includes GraphRAG by Microsoft Research.
  • BERGEN: A Benchmarking Library for Retrieval-Augmented Generation (paper, site)

    • Highlight: A library designed to benchmark RAG systems with a focus on question-answering (QA). It addresses the challenge of inconsistent benchmarking in comparing approaches and understanding the impact of each component in a RAG pipeline.
  • Retrieval-Enhanced Machine Learning (site)

    • Highlight: This is an ongoing, multi-year effort that now includes tutorials at SIGIR 2025 and SIGIR-AP 2024, as well as a workshop held in conjunction with SIGIR 2023. Meanwhile, the survey manuscript titled “Retrieval-Enhanced Machine Learning: Synthesis and Opportunities” (detailed in Survey Papers) and the accompanying slides (e.g., REML@SIGIR-2025 slides) are publicly available.
  • Tutorial: Retrieval-based Language Models and Applications

  • R3AG: Workshop on Refined and Reliable Retrieval Augmented Generation (site)

    • Highlight: This is an ongoing, multi-year effort that now includes workshops R3AG@SIGIR-AP 2024 (site and paper) and R3AG@SIGIR-AP 2025 (site, which calls for papers now).
  • BREV-RAG (Beyond Relevance-based EValuation of RAG systems)

    • Highlight: The workshop of BREV-RAG@SIGIR-AP 2025 (calling for papers now) focuses on the viewpoint of evaluation, which will be held in December, 2025.

About

TBA

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors