[SC-9395] Add RAG benchmarking demo notebook by juanmleng · Pull Request #346 · validmind/validmind-library

juanmleng · 2025-04-03T10:15:49Z

Internal Notes for Reviewers

Added titles to RAGAS metrics because when using comparison tests, figures without titles displayed "None" in the title updated by the comparison logic.

External Release Notes

Added a new notebook rag_benchmarking_demo.ipynb to demonstrate benchmarking of the RAG RFP use case by comparing multiple configurations at each stage of the pipeline. It evaluates two embedding models (OpenAI's text-embedding-3-small and text-embedding-3-large), two retrieval models with different k parameters (5 and 10), and two LLM generators (gpt-3.5-turbo and gpt-4o), creating a total of four complete RAG pipelines. The notebook runs the same tests included in rag_documentation_demo.ipynb such as RAGAS metrics (Context Precision, Faithfulness, Answer Correctness), generation quality metrics (ROUGE, BLEU, BERT Score), and bias/toxicity evaluations on all configurations.

notebooks/code_samples/nlp_and_llm/rag_benchmark_demo.ipynb

johnwalz97

lgtm

github-actions · 2025-04-09T07:23:44Z

PR Summary

This pull request introduces a comprehensive notebook for benchmarking Retrieval-Augmented Generation (RAG) models using the ValidMind library. The notebook demonstrates setting up a RAG pipeline with multiple embedding, retrieval, and generation models, including OpenAI's GPT-3.5 and GPT-4o. It showcases how to initialize models, load datasets, perform data validation, and run various tests to assess model performance. Additionally, the PR enhances visualization in several RAGAS tests by adding titles to histograms and box plots, improving the clarity of the visual output.

Test Suggestions

Run the notebook end-to-end to ensure all cells execute without errors.
Verify that the RAG pipeline correctly initializes and processes the dataset.
Check the output of the data validation tests for expected results.
Ensure that the embedding, retrieval, and generation models produce expected outputs.
Validate that the visualizations in the RAGAS tests display correctly with the new titles.
Test the notebook with different datasets to ensure robustness.
Check the compatibility of the notebook with different versions of the ValidMind library.

juanmleng added 4 commits April 2, 2025 16:17

Add rag notebook to showcase benchmarking using comparison tests

8162f0e

Update text in notebook

c71be4b

Merge branch 'main' into juan/sc-9395/extend-llm-testing-in-rag-use-case

c4bd1a5

Add title to response relevancy figures

178c5e2

juanmleng added documentation Improvements or additions to documentation enhancement New feature or request labels Apr 3, 2025

juanmleng self-assigned this Apr 3, 2025

juanmleng requested review from MichaelIngvarRoenning, cachafla, johnwalz97, kristof87, nrichers and rmcmen-vm April 3, 2025 10:20

rmcmen-vm reviewed Apr 3, 2025

View reviewed changes

notebooks/code_samples/nlp_and_llm/rag_benchmark_demo.ipynb Outdated Show resolved Hide resolved

juanmleng added 2 commits April 3, 2025 16:23

Constrained ragas to a working version

4f9f742

Update lock

37da615

johnwalz97 approved these changes Apr 3, 2025

View reviewed changes

juanmleng added 2 commits April 9, 2025 09:21

2.8.18

bc1f039

Add validmind install with LLM support

6751e8a

juanmleng merged commit 70edeb8 into main Apr 9, 2025
6 checks passed

johnwalz97 deleted the juan/sc-9395/extend-llm-testing-in-rag-use-case branch August 20, 2025 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SC-9395] Add RAG benchmarking demo notebook#346

[SC-9395] Add RAG benchmarking demo notebook#346
juanmleng merged 8 commits intomainfrom
juan/sc-9395/extend-llm-testing-in-rag-use-case

juanmleng commented Apr 3, 2025

Uh oh!

Uh oh!

johnwalz97 left a comment

Uh oh!

github-actions bot commented Apr 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmleng commented Apr 3, 2025

Internal Notes for Reviewers

External Release Notes

Uh oh!

Uh oh!

johnwalz97 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 9, 2025

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants