Skip to content

mcmenamin/Byo judge#386

Merged
rmcmen-vm merged 4 commits intomainfrom
byo_judge
Jun 26, 2025
Merged

mcmenamin/Byo judge#386
rmcmen-vm merged 4 commits intomainfrom
byo_judge

Conversation

@rmcmen-vm
Copy link
Contributor

Pull Request Description

What and why?

Currently the RAGAS and Prompt tests only support using OpenAI. This solution adds two ways to point to a new LLM/Embedding model to leverage in those tests.

This includes:

  • Giving the user ability to set LLM/Embedding as a parameter when running test
  • Giving user ability to set a default using a new function called set_judge_config()

How to test

Notebook used to test can be found in the Solutions Architect repo: https://github.com/validmind/solutions_architects_repo/blob/main/notebooks/byo_llm_for_judge.ipynb

I give an example of 1 RAGAS and 1 prompt test to ensure it works and has sample syntax for three ways Judge LLMs can be set

What needs special review?

Nothing of note

Dependencies, breaking changes, and deployment notes

None. Notebooks using syntax to run tests will continue to work.

Release notes

Adds ability to bring your own LLM. All existing tests using an LLM can now support using a user-defined LLM/Embedding model (assuming it is compatible with Langchain Chat/Embedding framework). This is restricted to RAGAS and Prompt validation tests.

Checklist

  • [ X] What and why
  • [X ] Screenshots or videos (Frontend)
  • [X ] How to test
  • [X ] What needs special review
  • [X ] Dependencies, breaking changes, and deployment notes
  • [X ] Labels applied
  • PR linked to Shortcut
  • Unit tests added (Backend)
  • [X ] Tested locally
  • Documentation updated (if required)
  • Environment variable additions/changes documented (if required)

@rmcmen-vm rmcmen-vm added the enhancement New feature or request label Jun 25, 2025
Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments, but looks good to me so far.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Robert McMenamin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Contributor

PR Summary

This PR introduces significant changes to support customizable judge LLM and embeddings across ValidMind's AI and prompt validation tests. The enhancements include:

  1. Adding new global variables (__judge_llm and __judge_embeddings) and a constant for the default embeddings model.

  2. Implementing two new utility functions in utils.py:

    • get_judge_config: Handles loading of the LLM and embeddings, supports both direct instances and Functional models, and issues descriptive errors if the provided models are not compatible with LangChain.
    • set_judge_config: Allows users to override the judge configuration with their own instances and converts Functional models if needed.
  3. Propagating the new judge_llm and judge_embeddings parameters to various testing functions in model_validation and prompt_validation modules. This ensures that all evaluation metrics and prompt tests can use either the default judge configuration or a custom one provided at runtime.

  4. Streamlining the configuration of the ragas tests by updating the get_ragas_config function to accept these new parameters and retrieving the judge configuration accordingly.

  5. Minor corrections in comments and documentation for clarity.

Overall, these changes allow flexibility in specifying the judge LLM during testing and provide detailed validation for compatibility, ensuring that downstream testing functions receive the correct configurations.

Test Suggestions

  • Write unit tests for get_judge_config to verify correct behavior when judge_llm and judge_embeddings are provided as compatible LangChain instances.
  • Add tests that pass invalid or incompatible objects for judge_llm/judge_embeddings to ensure that ValueError is raised with the expected error message.
  • Test the default behavior when no judge_llm or judge_embeddings is provided and check that the configuration is correctly fetched from the client using get_client_and_model.
  • Create integration tests that simulate end-to-end execution of model validation tests (e.g., AnswerCorrectness, AspectCritic) using both default and custom judge configurations.
  • Validate that the API key and base URL are correctly set in the environment when using the default configuration.

Copy link
Contributor

@johnwalz97 johnwalz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rmcmen-vm rmcmen-vm merged commit 3b2cb38 into main Jun 26, 2025
5 of 7 checks passed
@rmcmen-vm rmcmen-vm deleted the byo_judge branch June 26, 2025 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants