mcmenamin/Byo judge by rmcmen-vm · Pull Request #386 · validmind/validmind-library

rmcmen-vm · 2025-06-25T12:01:39Z

Pull Request Description

What and why?

Currently the RAGAS and Prompt tests only support using OpenAI. This solution adds two ways to point to a new LLM/Embedding model to leverage in those tests.

This includes:

Giving the user ability to set LLM/Embedding as a parameter when running test
Giving user ability to set a default using a new function called set_judge_config()

How to test

Notebook used to test can be found in the Solutions Architect repo: https://github.com/validmind/solutions_architects_repo/blob/main/notebooks/byo_llm_for_judge.ipynb

I give an example of 1 RAGAS and 1 prompt test to ensure it works and has sample syntax for three ways Judge LLMs can be set

What needs special review?

Nothing of note

Dependencies, breaking changes, and deployment notes

None. Notebooks using syntax to run tests will continue to work.

Release notes

Adds ability to bring your own LLM. All existing tests using an LLM can now support using a user-defined LLM/Embedding model (assuming it is compatible with Langchain Chat/Embedding framework). This is restricted to RAGAS and Prompt validation tests.

Checklist

[ X] What and why
[X ] Screenshots or videos (Frontend)
[X ] How to test
[X ] What needs special review
[X ] Dependencies, breaking changes, and deployment notes
[X ] Labels applied
PR linked to Shortcut
Unit tests added (Backend)
[X ] Tested locally
Documentation updated (if required)
Environment variable additions/changes documented (if required)

johnwalz97

Left a couple comments, but looks good to me so far.

validmind/tests/prompt_validation/Bias.py

validmind/ai/utils.py

…as Test

CLAassistant · 2025-06-25T17:34:25Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Robert McMenamin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2025-06-25T17:34:54Z

PR Summary

This PR introduces significant changes to support customizable judge LLM and embeddings across ValidMind's AI and prompt validation tests. The enhancements include:

Adding new global variables (__judge_llm and __judge_embeddings) and a constant for the default embeddings model.
Implementing two new utility functions in utils.py:
- get_judge_config: Handles loading of the LLM and embeddings, supports both direct instances and Functional models, and issues descriptive errors if the provided models are not compatible with LangChain.
- set_judge_config: Allows users to override the judge configuration with their own instances and converts Functional models if needed.
Propagating the new judge_llm and judge_embeddings parameters to various testing functions in model_validation and prompt_validation modules. This ensures that all evaluation metrics and prompt tests can use either the default judge configuration or a custom one provided at runtime.
Streamlining the configuration of the ragas tests by updating the get_ragas_config function to accept these new parameters and retrieving the judge configuration accordingly.
Minor corrections in comments and documentation for clarity.

Overall, these changes allow flexibility in specifying the judge LLM during testing and provide detailed validation for compatibility, ensuring that downstream testing functions receive the correct configurations.

Test Suggestions

Write unit tests for get_judge_config to verify correct behavior when judge_llm and judge_embeddings are provided as compatible LangChain instances.
Add tests that pass invalid or incompatible objects for judge_llm/judge_embeddings to ensure that ValueError is raised with the expected error message.
Test the default behavior when no judge_llm or judge_embeddings is provided and check that the configuration is correctly fetched from the client using get_client_and_model.
Create integration tests that simulate end-to-end execution of model validation tests (e.g., AnswerCorrectness, AspectCritic) using both default and custom judge configurations.
Validate that the API key and base URL are correctly set in the environment when using the default configuration.

johnwalz97

lgtm

Robert McMenamin added 3 commits June 23, 2025 16:24

update LLM as judge test to support BYO w formating

5129477

minor bug fixes

85e30d0

remove print statements

bb60d2c

rmcmen-vm requested review from AnilSorathiya, johnwalz97 and juanmleng June 25, 2025 12:01

rmcmen-vm added the enhancement New feature or request label Jun 25, 2025

johnwalz97 approved these changes Jun 25, 2025

View reviewed changes

validmind/tests/prompt_validation/Bias.py Outdated Show resolved Hide resolved

validmind/ai/utils.py Outdated Show resolved Hide resolved

updated error messages in aiUtils and removed unnecessary param in Bi…

b635d20

…as Test

johnwalz97 approved these changes Jun 26, 2025

View reviewed changes

rmcmen-vm merged commit 3b2cb38 into main Jun 26, 2025
5 of 7 checks passed

rmcmen-vm deleted the byo_judge branch June 26, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mcmenamin/Byo judge#386

mcmenamin/Byo judge#386
rmcmen-vm merged 4 commits intomainfrom
byo_judge

rmcmen-vm commented Jun 25, 2025

Uh oh!

johnwalz97 left a comment

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

johnwalz97 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rmcmen-vm commented Jun 25, 2025

Pull Request Description

What and why?

How to test

What needs special review?

Dependencies, breaking changes, and deployment notes

Release notes

Checklist

Uh oh!

johnwalz97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

PR Summary

Test Suggestions

Uh oh!

johnwalz97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants