Skip to content

New validation notebook: Validate an application scorecard model#338

Merged
validbeck merged 91 commits intomainfrom
beck/sc-9082/edit-validation-credit-risk-notebook
Mar 28, 2025
Merged

New validation notebook: Validate an application scorecard model#338
validbeck merged 91 commits intomainfrom
beck/sc-9082/edit-validation-credit-risk-notebook

Conversation

@validbeck
Copy link
Collaborator

@validbeck validbeck commented Mar 14, 2025

Internal Notes for Reviewers

sc-9082

Validate an application scorecard model

I edited Michael's draft notebook into a comprehensive model validation with the ValidMind Library experience: Validate an application scorecard model

Major stuff I modified in addition to the storytelling:

  • Conceptual stuff should now only be validator focused, removed a lot of the basics that the introductory WIP series will cover instead
  • Focus on registering the model as a validator & ensuring you have the correct permissions so things are returned & logged correctly
  • Cleaned up some of the code around identifying tests & added some more context
  • Added in running individual tests as examples under data quality and performance to introduce users to the concept of running tests before batching them
  • Added some more information on the custom test implementation
  • Cleaned up the test_config code as we were calling functions we weren't using, xgb_model should have been xgb_model_developer_champion etc.

Other notes

.gitignore

Had to add the model pickle file here as an exception so that users can access it:

# Sample application scorecard model for validation notebook — do not remove!

Add context to LLM-generated test descriptions

Just added a quick adjustment to include wording about session-locked custom context:

"This is a global setting that will affect all tests for your linked model for the duration of your ValidMind Library session:"

New to ValidMind?

I generalized the language in this templated section for all of our notebooks as the "Developer" portal now has much more than that — hopefully this new wording is more inclusive and descriptive as to what we hope that section will be:

"If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n",

ValidMind for model validation

This got moved to a new Story: sc-9378

This notebook series will be a simpler version of what we cover in this extensive notebook, focusing on validating the customer churn/binary classification model instead and include some more basics like previewing the template, adding the results to your documentation, etc.

External Release Notes

Learn how to independently assess an application scorecard model as a validator with our new Validate an application scorecard model Jupyter Notebook. You'll use ValidMind to evaluate the development of a model by conducting thorough testing and analysis, including the use of challenger models to benchmark performance.

This interactive notebook provides a step-by-step guide for:

  • Verifying the data quality steps performed by the model development team
  • Independently replicating the champion model's results and conducting additional tests to assess performance, stability, and robustness
  • Setting up test inputs and challenger models for comparative analysis
  • Running validation tests, analyzing results, and logging findings to ValidMind

@validbeck
Copy link
Collaborator Author

validbeck commented Mar 26, 2025

@MichaelIngvarRoenning Sorry for the delay, I've been heads down on updating some other training stuff & redesign for our homepage. To keep you updated while I make some changes:

1a. in notebook 111 you are talking about Key Concepts: Model Documentation is not relevant for Validation. it's Validation Report Documentation that is what we are focusing on here. Same with documentation template concept, we are not using test suits from a template in these notebooks.

Agreed, I will give these introductory concepts another edit to make them more tailored to model validation.

EDIT: Done!

Screenshot 2025-03-26 at 2 20 48 PM

1b. In general the language needs to be changed I see instances of cases where we say "was used to develop our dummy champion model" - we need the language to be that Model Developer has submitted their model to the validation team, and that champion model from developer is subject for testing from the Validation Team.

Will do, I will massage the messaging around this.

EDIT: Done!

Screenshot 2025-03-26 at 2 24 29 PM

1c. When we are pre-processing and doing feature engineering of the dataset - there needs to be emphasis in the description that Model Validator is not replicating what the Model Developer did for dataset adjustments, and we want to replicate this as we want to test if the transformations and processing of data was appropriate.

Got it, we are reproducing the steps the model developer WOULD have taken to see if they were done correctly, not just mimicking, correct?

EDIT: Done!

Screenshot 2025-03-26 at 2 25 01 PM

Made sure any sections referencing why we do this are clear etc.:

Screenshot 2025-03-26 at 2 25 34 PM

2a. Notebook 112 - I'm not sure why we are again importing a champion model when the section is develop challenger models. The Notebook should be clear that we are ALSO registering the champion model in the framework.
3a. Isn't 111 and 112 redundant since we are doing exactly the same in 113? what's the point of doing 111, 112 and then 113?

Same comment here as mentioned in Slack — unfortunately each notebook are "isolated" environments, so certain functions/inputs/outputs need to be rerun in order to be accessed. If you comment out those sections it won't work — very frustrating, I know, but I ran into this in the last series. It's just quirk of Jupyter Notebooks unfortunately.

2b. I think it makes more sense to merge notebook 111 and 112 into one piece to be honest, that makes more sense in this setup.
I recommend that we start with 113 with basis, and include what is needed from 111 and 112 and then end it with 114.

I would like to keep it as is — the reason is we are structuring our updated training around this breakdown and it also matches the development introductory experience. e.g.

validator-fundamentals Screenshot 2025-03-26 at 1 43 37 PM

I think that once I clean up 111 to be more validator focused it will make more sense like this though!

2c. I think the last part of Enable use-case context should come when we actually do a test where it becomes relevant. These notebooks should not be an "intro for validator" this should be more related to validation of credit risk application where we are assuming they knowledgable users.

  • I thought it was out of place too, can you recommend where you would actually introduce this? Should we just remove the first instance and leave the second one where we modify description for the comparison tests?
  • We are actually trying to develop an introduction for validation experience first (to work with the training), so I think we should focus on the basics for this initiative and then follow-up with a specific more complex use case notebook. With that in mind, how do you think we can approach this differently in your recommendation?

3b. When running the list of tests, I actually recommend we run ONE test first to see and then we run the loop of all tests. this is relevant for every loop we have here. The rest is great

Can you provide an example here of how you might structure it? I broke them down as best I could but I don't know enough about which tests are valuable to make a call.

@validbeck validbeck changed the title New notebook series: ValidMind for model validation New validation notebook: Validate an application scorecard model Mar 28, 2025
Copy link

@MichaelIngvarRoenning MichaelIngvarRoenning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks excellent!

@github-actions
Copy link
Contributor

PR Summary

This pull request introduces a new Jupyter notebook for validating an application scorecard model using the ValidMind Library. The notebook provides a comprehensive guide for assessing the model's development through testing and analysis, including the use of challenger models for benchmarking performance. Key features of the notebook include:

  • Instructions for setting up the environment and initializing the ValidMind Library.
  • Steps for importing and preprocessing a sample dataset.
  • Training of potential challenger models (Random Forest and Logistic Regression) for comparison.
  • Running various data quality and performance tests using the ValidMind Library.
  • Implementing a custom test for score-to-odds analysis.
  • Logging and verifying test results within the ValidMind Platform.

Additionally, the PR updates multiple documentation references across various notebooks to include information about running tests and accessing the Python Library API reference. It also updates the .gitignore file to ensure a specific model file is not ignored.

The version of the ValidMind Library is incremented from 2.8.13 to 2.8.14.

Test Suggestions

  • Run the new model validation notebook to ensure all steps execute without errors.
  • Verify that the custom test for score-to-odds analysis produces the expected output.
  • Check that the updated documentation references are correct and lead to the intended resources.
  • Ensure that the .gitignore changes correctly include the specified model file.

@validbeck validbeck merged commit ad38ff0 into main Mar 28, 2025
6 checks passed
@validbeck validbeck deleted the beck/sc-9082/edit-validation-credit-risk-notebook branch March 28, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants