New validation notebook: Validate an application scorecard model by validbeck · Pull Request #338 · validmind/validmind-library

validbeck · 2025-03-14T21:49:10Z

Internal Notes for Reviewers

sc-9082

Validate an application scorecard model

I edited Michael's draft notebook into a comprehensive model validation with the ValidMind Library experience: Validate an application scorecard model

Major stuff I modified in addition to the storytelling:

Conceptual stuff should now only be validator focused, removed a lot of the basics that the introductory WIP series will cover instead
Focus on registering the model as a validator & ensuring you have the correct permissions so things are returned & logged correctly
Cleaned up some of the code around identifying tests & added some more context
Added in running individual tests as examples under data quality and performance to introduce users to the concept of running tests before batching them
Added some more information on the custom test implementation
Cleaned up the test_config code as we were calling functions we weren't using, xgb_model should have been xgb_model_developer_champion etc.

Other notes

`.gitignore`

Had to add the model pickle file here as an exception so that users can access it:

validmind-library/.gitignore

Line 194 in 72596a0

# Sample application scorecard model for validation notebook — do not remove!

Add context to LLM-generated test descriptions

Just added a quick adjustment to include wording about session-locked custom context:

validmind-library/notebooks/how_to/add_context_to_llm_descriptions.ipynb

Line 379 in 09a348b

    
           "This is a global setting that will affect all tests for your linked model for the duration of your ValidMind Library session:"

New to ValidMind?

I generalized the language in this templated section for all of our notebooks as the "Developer" portal now has much more than that — hopefully this new wording is more inclusive and descriptive as to what we hope that section will be:

validmind-library/notebooks/templates/about-validmind.ipynb

Line 34 in 09a348b

    
           "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n",

ValidMind for model validation

This got moved to a new Story: sc-9378

This notebook series will be a simpler version of what we cover in this extensive notebook, focusing on validating the customer churn/binary classification model instead and include some more basics like previewing the template, adding the results to your documentation, etc.

External Release Notes

Learn how to independently assess an application scorecard model as a validator with our new Validate an application scorecard model Jupyter Notebook. You'll use ValidMind to evaluate the development of a model by conducting thorough testing and analysis, including the use of challenger models to benchmark performance.

This interactive notebook provides a step-by-step guide for:

Verifying the data quality steps performed by the model development team
Independently replicating the champion model's results and conducting additional tests to assess performance, stability, and robustness
Setting up test inputs and challenger models for comparative analysis
Running validation tests, analyzing results, and logging findings to ValidMind

validbeck · 2025-03-26T20:48:12Z

@MichaelIngvarRoenning Sorry for the delay, I've been heads down on updating some other training stuff & redesign for our homepage. To keep you updated while I make some changes:

1a. in notebook 111 you are talking about Key Concepts: Model Documentation is not relevant for Validation. it's Validation Report Documentation that is what we are focusing on here. Same with documentation template concept, we are not using test suits from a template in these notebooks.

Agreed, I will give these introductory concepts another edit to make them more tailored to model validation.

EDIT: Done!

1b. In general the language needs to be changed I see instances of cases where we say "was used to develop our dummy champion model" - we need the language to be that Model Developer has submitted their model to the validation team, and that champion model from developer is subject for testing from the Validation Team.

Will do, I will massage the messaging around this.

EDIT: Done!

1c. When we are pre-processing and doing feature engineering of the dataset - there needs to be emphasis in the description that Model Validator is not replicating what the Model Developer did for dataset adjustments, and we want to replicate this as we want to test if the transformations and processing of data was appropriate.

Got it, we are reproducing the steps the model developer WOULD have taken to see if they were done correctly, not just mimicking, correct?

EDIT: Done!

Made sure any sections referencing why we do this are clear etc.:

2a. Notebook 112 - I'm not sure why we are again importing a champion model when the section is develop challenger models. The Notebook should be clear that we are ALSO registering the champion model in the framework.
3a. Isn't 111 and 112 redundant since we are doing exactly the same in 113? what's the point of doing 111, 112 and then 113?

Same comment here as mentioned in Slack — unfortunately each notebook are "isolated" environments, so certain functions/inputs/outputs need to be rerun in order to be accessed. If you comment out those sections it won't work — very frustrating, I know, but I ran into this in the last series. It's just quirk of Jupyter Notebooks unfortunately.

2b. I think it makes more sense to merge notebook 111 and 112 into one piece to be honest, that makes more sense in this setup.
I recommend that we start with 113 with basis, and include what is needed from 111 and 112 and then end it with 114.

I would like to keep it as is — the reason is we are structuring our updated training around this breakdown and it also matches the development introductory experience. e.g.

I think that once I clean up 111 to be more validator focused it will make more sense like this though!

2c. I think the last part of Enable use-case context should come when we actually do a test where it becomes relevant. These notebooks should not be an "intro for validator" this should be more related to validation of credit risk application where we are assuming they knowledgable users.

I thought it was out of place too, can you recommend where you would actually introduce this? Should we just remove the first instance and leave the second one where we modify description for the comparison tests?
We are actually trying to develop an introduction for validation experience first (to work with the training), so I think we should focus on the basics for this initiative and then follow-up with a specific more complex use case notebook. With that in mind, how do you think we can approach this differently in your recommendation?

3b. When running the list of tests, I actually recommend we run ONE test first to see and then we run the loop of all tests. this is relevant for every loop we have here. The rest is great

Can you provide an example here of how you might structure it? I broke them down as best I could but I don't know enough about which tests are valuable to make a call.

MichaelIngvarRoenning

Everything looks excellent!

github-actions · 2025-03-28T17:15:00Z

PR Summary

This pull request introduces a new Jupyter notebook for validating an application scorecard model using the ValidMind Library. The notebook provides a comprehensive guide for assessing the model's development through testing and analysis, including the use of challenger models for benchmarking performance. Key features of the notebook include:

Instructions for setting up the environment and initializing the ValidMind Library.
Steps for importing and preprocessing a sample dataset.
Training of potential challenger models (Random Forest and Logistic Regression) for comparison.
Running various data quality and performance tests using the ValidMind Library.
Implementing a custom test for score-to-odds analysis.
Logging and verifying test results within the ValidMind Platform.

Additionally, the PR updates multiple documentation references across various notebooks to include information about running tests and accessing the Python Library API reference. It also updates the .gitignore file to ensure a specific model file is not ignored.

The version of the ValidMind Library is incremented from 2.8.13 to 2.8.14.

Test Suggestions

Run the new model validation notebook to ensure all steps execute without errors.
Verify that the custom test for score-to-odds analysis produces the expected output.
Check that the updated documentation references are correct and lead to the intended resources.
Ensure that the .gitignore changes correctly include the specified model file.

validbeck added 30 commits March 10, 2025 12:49

Uploading Michael's notebook before testing

20ef271

+ champion model file & moved draft to tutorials

eb3ada7

111 draft

17b5da8

111 notes

bc4e985

111 adjusting

29a6a07

111 +intro

2ab9c26

111 + intro

c7fcb9e

Merge 'main' into beck/sc-9082/edit-validation-credit-risk-notebook

653c22d

102 accidentally deleted a header somehow

d942bdf

111 new draft

5115202

111 editing intro to be validator-focused

fbab0ee

Edited the About ValidMind section to be more role agnostic

a9e1ddf

Missed one

a1568c0

111 +Register a sample model

26ed550

111 +Getting to know ValidMind

4416aa0

111 Set up for Import the champion model

1dc92cf

111 Shuffling things around

a8b6ff9

111 Importing the champion model done?

a6446a8

111 Setup for custom context

2794c90

Moving LLM context stuff, changed my mind

b9c9c99

111 Added section on finding the validation report in the platform

0b1fe9f

111 +ToC

c988e68

112 setup

4b5f318

112 copied info over

9cd9f70

111 more context

1de2f5d

112 Setup for challenger models

357ff98

112 Challenger models v1

94bb706

112 Setup for custom context

f755e51

Changed dataset for custom context

a8a5dc8

113 setup

4fb7c24

Updating the old intro notebook just in case someone finds it

6034dd3

validbeck added 20 commits March 26, 2025 14:17

Edits to validation key concepts + dataset transformations

5dcfa66

Removing test suites as well

a424832

Tweak

2ed90a3

Setup for advanced notebook

92749df

Moving the application scorecard advanced notebook

037faf4

Moving 111 to new notebook

e41066c

Moving 112 to new notebook

f7889ee

Moving 113 to new notebook

5b6e7ef

Moving 114 to new notebook

ebbc3a1

Moving the champion model

f81c9ae

Shifting use case stuff around take 1

649dff1

Moving more stuff around

1b0a78a

Adding individual tests

0224c69

Editing

cd4e766

More editing

53fad6b

Adding ToC

a5bc2c8

Ugh

c9ad497

Cleaning up the What's next section

9295b73

Removing the tests causing issues

b023a40

Final edits?

72596a0

validbeck changed the title ~~New notebook series: ValidMind for model validation~~ New validation notebook: Validate an application scorecard model Mar 28, 2025

validbeck added 2 commits March 28, 2025 09:10

xgb_model > xgb_model_developer_champion

250da62

Moving the introductory notebooks to own PR

7162288

MichaelIngvarRoenning approved these changes Mar 28, 2025

View reviewed changes

2.8.14

e75347a

validbeck merged commit ad38ff0 into main Mar 28, 2025
6 checks passed

validbeck deleted the beck/sc-9082/edit-validation-credit-risk-notebook branch March 28, 2025 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New validation notebook: Validate an application scorecard model#338

New validation notebook: Validate an application scorecard model#338
validbeck merged 91 commits intomainfrom
beck/sc-9082/edit-validation-credit-risk-notebook

validbeck commented Mar 14, 2025 •

edited

Loading

Uh oh!

validbeck commented Mar 26, 2025 •

edited

Loading

Uh oh!

MichaelIngvarRoenning left a comment

Uh oh!

github-actions bot commented Mar 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

validbeck commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Internal Notes for Reviewers

Validate an application scorecard model

Other notes

.gitignore

Add context to LLM-generated test descriptions

New to ValidMind?

ValidMind for model validation

External Release Notes

Uh oh!

validbeck commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaelIngvarRoenning left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 28, 2025

PR Summary

Test Suggestions

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

validbeck commented Mar 14, 2025 •

edited

Loading

`.gitignore`

validbeck commented Mar 26, 2025 •

edited

Loading