Skip to content

[SC-9395] Add RAG benchmarking demo notebook#346

Merged
juanmleng merged 8 commits intomainfrom
juan/sc-9395/extend-llm-testing-in-rag-use-case
Apr 9, 2025
Merged

[SC-9395] Add RAG benchmarking demo notebook#346
juanmleng merged 8 commits intomainfrom
juan/sc-9395/extend-llm-testing-in-rag-use-case

Conversation

@juanmleng
Copy link
Contributor

Internal Notes for Reviewers

Added titles to RAGAS metrics because when using comparison tests, figures without titles displayed "None" in the title updated by the comparison logic.

External Release Notes

Added a new notebook rag_benchmarking_demo.ipynb to demonstrate benchmarking of the RAG RFP use case by comparing multiple configurations at each stage of the pipeline. It evaluates two embedding models (OpenAI's text-embedding-3-small and text-embedding-3-large), two retrieval models with different k parameters (5 and 10), and two LLM generators (gpt-3.5-turbo and gpt-4o), creating a total of four complete RAG pipelines. The notebook runs the same tests included in rag_documentation_demo.ipynb such as RAGAS metrics (Context Precision, Faithfulness, Answer Correctness), generation quality metrics (ROUGE, BLEU, BERT Score), and bias/toxicity evaluations on all configurations.

@juanmleng juanmleng added documentation Improvements or additions to documentation enhancement New feature or request labels Apr 3, 2025
@juanmleng juanmleng self-assigned this Apr 3, 2025
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2025

PR Summary

This pull request introduces a comprehensive notebook for benchmarking Retrieval-Augmented Generation (RAG) models using the ValidMind library. The notebook demonstrates setting up a RAG pipeline with multiple embedding, retrieval, and generation models, including OpenAI's GPT-3.5 and GPT-4o. It showcases how to initialize models, load datasets, perform data validation, and run various tests to assess model performance. Additionally, the PR enhances visualization in several RAGAS tests by adding titles to histograms and box plots, improving the clarity of the visual output.

Test Suggestions

  • Run the notebook end-to-end to ensure all cells execute without errors.
  • Verify that the RAG pipeline correctly initializes and processes the dataset.
  • Check the output of the data validation tests for expected results.
  • Ensure that the embedding, retrieval, and generation models produce expected outputs.
  • Validate that the visualizations in the RAGAS tests display correctly with the new titles.
  • Test the notebook with different datasets to ensure robustness.
  • Check the compatibility of the notebook with different versions of the ValidMind library.

@juanmleng juanmleng merged commit 70edeb8 into main Apr 9, 2025
6 checks passed
@johnwalz97 johnwalz97 deleted the juan/sc-9395/extend-llm-testing-in-rag-use-case branch August 20, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants