Skip to content

notebooks: Quickstart for model validation#376

Merged
validbeck merged 44 commits intomainfrom
beck/sc-10339/create-code-samples-notebook-quickstart-for
May 28, 2025
Merged

notebooks: Quickstart for model validation#376
validbeck merged 44 commits intomainfrom
beck/sc-10339/create-code-samples-notebook-quickstart-for

Conversation

@validbeck
Copy link
Collaborator

@validbeck validbeck commented May 22, 2025

Pull Request Description

sc-10339

What

There is a net-new notebook under notebooks/quickstart/: Quickstart for model validation

This notebook is a companion notebook to our existing "Quickstart for model documentation," and goes over the basics of getting started with model validation with the ValidMind Library with the idea that the model you're validating was created using the model documentation quickstart.

Why

Our validator resources are currently very sparse. This is also a step towards retooling our undeveloped "Get Started" section in the documentation.

How to Test

  1. Pull down this PR: gh pr checkout 376
  2. Open the "Quickstart for model validation" notebook: /notebooks/quickstart/quickstart_model_validation.ipynb
  3. Follow the instructions in the notebook to make sure everything runs correctly, and content is accurate as described.

Pull Request Dependencies

Tip

Refer to the deployment notes below.

External Release Notes

Want to get started with validating models with the ValidMind Library? Check out our brand new Quickstart for model validation notebook:

  • Learn the basics of using ValidMind to validate models as part of a model validation workflow.
  • Set up the ValidMind Library in your environment, and independently audit data quality adjustments and a proposed champion model using ValidMind tests for a binary classification model.

Deployment Notes

Changes to the notebooks will be cherry-picked into the documentation repo with this branch when changes are approved validmind-library side via this PR: validmind/documentation#731

Breaking Changes

n/a

Screenshots/Videos (Frontend Only)

n/a

Checklist

  • PR body describes what, why, and how to test
  • Release notes written
  • Deployment notes written
  • Breaking changes identified — N/A
  • Labels applied
  • PR linked to Shortcut
  • Screenshots/videos added (Frontend) — N/A
  • Unit tests added (Backend) — N/A
  • Tested locally
  • Documentation updated (if required)

Areas Needing Special Review

Important

I took some creative license with the following sections, so someone should check if the examples are relevant, accurate, and properly described:

  • Running data quality tests > Run data comparison tests
  • Running model evaluation tests > Run model performance tests, Run diagnostic tests, & Run feature importance tests

Run data comparison tests

Check if the explanatory comments on why we compare the two different sets of paired datasets is accurate, and if these two comparisons are in fact relevant and demonstrative.

Run model performance tests

Check if the lead-in text for why we use the testing dataset for our performance tests is relevant and accurate.

Run diagnostic tests

Check if the lead-in text for why we use the training and testing datasets for our diagnostic tests is relevant and accurate.

Run feature importance tests

Check if the lead-in text for why we use the testing dataset for our feature importance tests is relevant and accurate.

Additional Notes

I also adjusted the following sections in these notebooks as I noticed they were incomplete/out of date:

Validate an application scorecard model

Validate an application scorecard model: Setting up > Assign validator credentials

This was missing the update where you also have to remove yourself as a model owner. Remedied:

Screenshot 2025-05-22 at 10 56 21 AM

Finalize testing and documentation (ValidMind for model development)

Finalize testing and documentation

The next steps section needed some TLC in comparison to the newer validation series, so I spruced it up:

Screenshot 2025-05-22 at 11 51 37 AM

Developing challenger models (ValidMind for model validation)

Developing a potential challenger model: Running model evaluation tests

Since I added more context to why we use certain datasets in the quickstart, I added the same explanations in this introductory notebook as well under the Running model evaluation tests sub-sections:

  • Run model performance tests
  • Run diagnostic tests
  • Run feature importance tests

Note

Refer also to the "Areas Needing Special Review" section above.

FInalize testing and reporting (ValidMind for model validation)

FInalize testing and reporting

Tidied up the Next steps section as well here:

Screenshot 2025-05-22 at 11 53 47 AM

@validbeck validbeck self-assigned this May 22, 2025
@validbeck validbeck added the highlight Feature to be curated in the release notes label May 22, 2025
@validbeck validbeck marked this pull request as ready for review May 22, 2025 18:47
@validbeck validbeck requested review from LoiAnsah and juanmleng May 22, 2025 18:48
validbeck and others added 6 commits May 27, 2025 10:01
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
@github-actions
Copy link
Contributor

PR Summary

This pull request introduces several enhancements and additions to the model validation documentation and quickstart guides within the project. Key changes include:

  1. .gitignore Update: Added a new entry to ensure that the xgboost_model_champion.pkl file in the notebooks/quickstart directory is not ignored, allowing it to be included in version control.

  2. Notebook Enhancements:

    • Model Validation Quickstart: A comprehensive new notebook (quickstart_model_validation.ipynb) has been added. This notebook provides a step-by-step guide for using the ValidMind Library to validate models, including setting up the environment, importing datasets, running data quality tests, and evaluating model performance.
    • Documentation Improvements: Several existing notebooks have been updated to improve clarity and guidance on using the ValidMind Platform. This includes more detailed instructions on removing oneself as a model owner and developer, and adding oneself as a validator.
    • Model Development and Validation Tutorials: Enhanced the documentation with additional guidance on running and logging tests, inserting test results, and collaborating with stakeholders using the ValidMind Platform.
  3. Code and Textual Corrections:

    • Corrected minor textual errors and improved the clarity of instructions across various notebooks.
    • Enhanced explanations of concepts such as overfitting, robustness, and stability in model evaluation.

These changes aim to improve the user experience and provide clearer guidance for users working with the ValidMind Library and Platform.

Test Suggestions

  • Run the new quickstart notebook to ensure all steps execute without errors.
  • Verify that the xgboost_model_champion.pkl file is correctly included in version control and accessible in the quickstart notebook.
  • Test the updated instructions for removing and adding roles in the model validation process to ensure they are clear and accurate.
  • Check the enhanced documentation for clarity and completeness, especially the new sections on collaboration and test result logging.
  • Ensure that all links to external resources and documentation are valid and lead to the correct pages.

Co-authored-by: Lois Ansah <133300328+LoiAnsah@users.noreply.github.com>
@validbeck
Copy link
Collaborator Author

@LoiAnsah For the record, this is what happens when the JSON is incorrect and the notebook is corrupted... ;)

Screenshot 2025-05-27 at 10 13 12 AM
Screenshot 2025-05-27 at 10 13 18 AM
Screenshot 2025-05-27 at 10 16 14 AM

Can you think about how if this were your notebook, how you would go about fixing it?

@LoiAnsah
Copy link
Contributor

@LoiAnsah For the record, this is what happens when the JSON is incorrect and the notebook is corrupted... ;)

Screenshot 2025-05-27 at 10 13 12 AM Screenshot 2025-05-27 at 10 13 18 AM Screenshot 2025-05-27 at 10 16 14 AM

Can you think about how if this were your notebook, how you would go about fixing it?

@LoiAnsah LoiAnsah closed this May 27, 2025
@validbeck
Copy link
Collaborator Author

@LoiAnsah, why did you close this PR? In general, you shouldn't be closing or merging other people's PRs on their behalf, especially without communication.

Screenshot 2025-05-27 at 10 33 15 AM

@validbeck validbeck reopened this May 27, 2025
@LoiAnsah
Copy link
Contributor

@validbeck I’d check my logs and switch back to the one just before the current one.

@LoiAnsah
Copy link
Contributor

@validbeck Apoligies, I closed it by mistake. I was trying to reply to your comment. I select quote reply.

@validbeck
Copy link
Collaborator Author

I’d check my logs and switch back to the one just before the current one.

In this case, this is not the right approach — you want to apply the suggested changes. The "roll-back" method is only if you don't want to retain the changes and want to revert to known working version and start fresh. You will encounter many situations like this, where you will need to evaluate on a case-by-case basis how to best approach fixing things. Rolling back is not the only answer, especially if you want to retain later work.

What I did was:

  • Reopen the notebook in the text view
  • Locate the incorrect syntax lines (with a help of a formatter like I showed you yesterday)
  • Fix the incorrect syntax lines

We can go over this together in a session because I want you to interact with Jupyter Notebooks under the hood and what it looks like. In preparation, please:

  • Pull down the latest version of this PR branch
  • Roll back to this commit, with the corruption: 4a8e33fbad34628b07bcb50c0d0897bbffe1f3d4

@validbeck validbeck requested review from LoiAnsah and juanmleng May 27, 2025 18:07
@validbeck
Copy link
Collaborator Author

@juanmleng @LoiAnsah I've either committed the suggestions and edited the surrounding context to match the changes, or left explanations via a comment as to why the suggestions weren't applied. Can either of you please take another look, and approve if it looks good enough? 🙏🏻

@validbeck validbeck requested a review from juanmleng May 27, 2025 19:50
@github-actions
Copy link
Contributor

PR Summary

This pull request introduces several enhancements and bug fixes to the model validation and documentation notebooks within the project. Key changes include:

  1. .gitignore Update: Added a new entry to ensure that the xgboost_model_champion.pkl file in the notebooks/quickstart directory is not ignored, allowing it to be tracked by Git.

  2. Notebook Enhancements:

    • Model Validation Notebooks: Added detailed instructions and clarifications on the steps involved in model validation, including setting up the environment, running tests, and logging results to the ValidMind Platform. The changes improve the clarity and usability of the notebooks for users new to the ValidMind Library.
    • Model Documentation Notebooks: Expanded the guidance on working with model documentation, including running additional tests, inserting test results, and collaborating with stakeholders. These enhancements aim to streamline the documentation process and improve collaboration.
  3. New Quickstart Notebook: Introduced a new notebook quickstart_model_validation.ipynb that provides a comprehensive guide to using the ValidMind Library for model validation. This notebook covers importing datasets, running data quality tests, importing and initializing models, and conducting various validation tests.

  4. Textual Improvements: Made several textual improvements across multiple notebooks to enhance readability and provide more context to the users. This includes rephrasing instructions, adding explanations for key concepts, and improving the flow of the content.

These changes collectively aim to improve the user experience and effectiveness of the model validation and documentation process using the ValidMind Library.

Test Suggestions

  • Verify that the new entry in .gitignore correctly tracks the xgboost_model_champion.pkl file.
  • Run the updated model validation notebooks to ensure all steps execute without errors.
  • Check that the new quickstart_model_validation.ipynb notebook provides a clear and comprehensive guide for new users.
  • Test the logging of test results to the ValidMind Platform to ensure it functions as expected.
  • Review the textual changes for clarity and accuracy in conveying the intended instructions.

@validbeck validbeck merged commit d596838 into main May 28, 2025
7 checks passed
@validbeck validbeck deleted the beck/sc-10339/create-code-samples-notebook-quickstart-for branch May 28, 2025 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

highlight Feature to be curated in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants