diff --git a/.gitignore b/.gitignore index 9ab39fd08..bc3a104d3 100644 --- a/.gitignore +++ b/.gitignore @@ -197,6 +197,8 @@ lending_club_loan_data_*.csv !notebooks/code_samples/model_validation/xgb_model_champion.pkl # Sample logistic regression model for validation series — do not remove! !notebooks/tutorials/model_validation/lr_model_champion.pkl +# Sample XGBoost model for validation quickstart — do not remove! +!notebooks/quickstart/xgboost_model_champion.pkl notebooks/llm/datasets/*.jsonl diff --git a/notebooks/code_samples/model_validation/validate_application_scorecard.ipynb b/notebooks/code_samples/model_validation/validate_application_scorecard.ipynb index 2ac83e369..6dd025294 100644 --- a/notebooks/code_samples/model_validation/validate_application_scorecard.ipynb +++ b/notebooks/code_samples/model_validation/validate_application_scorecard.ipynb @@ -202,13 +202,19 @@ "\n", "In order to log tests as a validator instead of as a developer, on the model details page that appears after you've successfully registered your sample model:\n", "\n", - "1. Remove yourself as a developer: \n", + "1. Remove yourself as a model owner: \n", + "\n", + " - Click on the **OWNERS** tile.\n", + " - Click the **x** next to your name to remove yourself from that model's role.\n", + " - Click **Save** to apply your changes to that role.\n", + "\n", + "2. Remove yourself as a developer: \n", "\n", " - Click on the **DEVELOPERS** tile.\n", " - Click the **x** next to your name to remove yourself from that model's role.\n", " - Click **Save** to apply your changes to that role.\n", "\n", - "2. Add yourself as a validator: \n", + "3. Add yourself as a validator: \n", "\n", " - Click on the **VALIDATORS** tile.\n", " - Select your name from the drop-down menu.\n", @@ -1358,9 +1364,9 @@ "\n", "## Run diagnostic tests\n", "\n", - "Next we want to inspect the robustness and stability testing comparison between our champion and challenger model.\n", + "Next, we want to inspect the robustness and stability testing comparison between our champion and challenger model.\n", "\n", - "Use `list_tests()` to identify all the model diagnosis tests for classification:" + "Use `list_tests()` to list all available diagnosis tests applicable to classification tasks:" ] }, { diff --git a/notebooks/quickstart/quickstart_model_documentation.ipynb b/notebooks/quickstart/quickstart_model_documentation.ipynb index c42413519..67b3f1816 100644 --- a/notebooks/quickstart/quickstart_model_documentation.ipynb +++ b/notebooks/quickstart/quickstart_model_documentation.ipynb @@ -617,7 +617,7 @@ "\n", "### Assign predictions\n", "\n", - "Once the model has been registered you can assign model predictions to the training and testing datasets.\n", + "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n", "\n", "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n", "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n", @@ -755,7 +755,7 @@ "\n", "2. In the left sidebar that appears for your model, click **Documentation**.\n", "\n", - "What you see is the full draft of your model documentation in a more easily consumable version. From here, you can make qualitative edits to model documentation, view guidelines, collaborate with validators, and submit your model documentation for approval when it's ready. [Learn more ...](https://docs.validmind.ai/guide/working-with-model-documentation.html)" + " What you see is the full draft of your model documentation in a more easily consumable version. From here, you can make qualitative edits to model documentation, view guidelines, collaborate with validators, and submit your model documentation for approval when it's ready. [Learn more ...](https://docs.validmind.ai/guide/working-with-model-documentation.html)" ] }, { diff --git a/notebooks/quickstart/quickstart_model_validation.ipynb b/notebooks/quickstart/quickstart_model_validation.ipynb new file mode 100644 index 000000000..4e233f963 --- /dev/null +++ b/notebooks/quickstart/quickstart_model_validation.ipynb @@ -0,0 +1,1174 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "926aad7c", + "metadata": {}, + "source": [ + "# Quickstart for model validation\n", + "\n", + "Learn the basics of using ValidMind to validate models as part of a model validation workflow. Set up the ValidMind Library in your environment, and generate a draft of a validation report using ValidMind tests for a binary classification model.\n", + "\n", + "To validate a model with the ValidMind Library, we'll:\n", + "\n", + "1. Import a sample dataset and preprocess it, then split the datasets and initialize them for use with ValidMind\n", + "2. Independently verify data quality tests performed on datasets by model development\n", + "3. Import a champion model for evaluation\n", + "4. Run model evaluation tests with the ValidMind Library, which will send the results of those tests to the ValidMind Platform" + ] + }, + { + "cell_type": "markdown", + "id": "dc56a98f", + "metadata": {}, + "source": [ + "::: {.content-hidden when-format=\"html\"}\n", + "## Contents \n", + "- [Introduction](#toc1_) \n", + "- [About ValidMind](#toc2_) \n", + " - [Before you begin](#toc2_1_) \n", + " - [New to ValidMind?](#toc2_2_) \n", + " - [Key concepts](#toc2_3_) \n", + "- [Setting up](#toc3_) \n", + " - [Register a sample model](#toc3_1_) \n", + " - [Assign validator credentials](#toc3_1_1_) \n", + " - [Install the ValidMind Library](#toc3_2_) \n", + " - [Initialize the ValidMind Library](#toc3_3_) \n", + " - [Get your code snippet](#toc3_3_1_) \n", + " - [Initialize the Python environment](#toc3_4_) \n", + "- [Getting to know ValidMind](#toc4_) \n", + " - [Preview the validation report template](#toc4_1_) \n", + " - [View validation report in the ValidMind Platform](#toc4_2_) \n", + "- [Importing the sample dataset](#toc5_) \n", + " - [Load the sample dataset](#toc5_1_) \n", + " - [Preprocess the raw dataset](#toc5_2_) \n", + " - [Split the dataset](#toc5_2_1_) \n", + " - [Separate features and targets](#toc5_2_2_) \n", + "- [Running data quality tests](#toc6_) \n", + " - [Identify qualitative tests](#toc6_1_) \n", + " - [Initialize the ValidMind datasets](#toc6_2_) \n", + " - [Run an individual data quality test](#toc6_3_) \n", + " - [Run data comparison tests](#toc6_4_) \n", + "- [Importing the champion model](#toc7_) \n", + " - [Initialize a model object](#toc7_1_) \n", + " - [Assign predictions](#toc7_2_) \n", + "- [Running model evaluation tests](#toc8_) \n", + " - [Run model performance tests](#toc8_1_) \n", + " - [Run diagnostic tests](#toc8_2_) \n", + " - [Run feature importance tests](#toc8_3_) \n", + "- [In summary](#toc9_) \n", + "- [Next steps](#toc10_) \n", + " - [Work with your validation report](#toc10_1_) \n", + " - [Discover more learning resources](#toc10_2_) \n", + "- [Upgrade ValidMind](#toc11_) \n", + "\n", + ":::\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "e3f7cf22", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Introduction\n", + "\n", + "Model validation aims to independently assess the compliance of *champion models* created by model developers with regulatory guidance by conducting thorough testing and analysis, potentially including the use of challenger models to benchmark performance. Assessments, presented in the form of a validation report, typically include *model findings* and recommendations to address those issues.\n", + "\n", + "A *binary classification model* is a type of predictive model used in churn analysis to identify customers who are likely to leave a service or subscription by analyzing various behavioral, transactional, and demographic factors.\n", + "\n", + "- This model helps businesses take proactive measures to retain at-risk customers by offering personalized incentives, improving customer service, or adjusting pricing strategies.\n", + "- Effective validation of a churn prediction model ensures that businesses can accurately identify potential churners, optimize retention efforts, and enhance overall customer satisfaction while minimizing revenue loss." + ] + }, + { + "cell_type": "markdown", + "id": "e5065e3c", + "metadata": {}, + "source": [ + "\n", + "\n", + "## About ValidMind\n", + "\n", + "ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models.\n", + "\n", + "You use the ValidMind Library to automate comparison and other validation tests, and then use the ValidMind Platform to submit compliance assessments of champion models via comprehensive validation reports. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model developers." + ] + }, + { + "cell_type": "markdown", + "id": "541fadfb", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Before you begin\n", + "\n", + "This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. \n", + "\n", + "If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html)." + ] + }, + { + "cell_type": "markdown", + "id": "a196a66c", + "metadata": {}, + "source": [ + "\n", + "\n", + "### New to ValidMind?\n", + "\n", + "If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.\n", + "\n", + "
For access to all features available in this notebook, create a free ValidMind account.\n", + "

\n", + "Signing up is FREE — Register with ValidMind
" + ] + }, + { + "cell_type": "markdown", + "id": "29878854", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Key concepts\n", + "\n", + "**Validation report**: A comprehensive and structured assessment of a model’s development and performance, focusing on verifying its integrity, appropriateness, and alignment with its intended use. It includes analyses of model assumptions, data quality, performance metrics, outcomes of testing procedures, and risk considerations. The validation report supports transparency, regulatory compliance, and informed decision-making by documenting the validator’s independent review and conclusions.\n", + "\n", + "**Validation report template**: Serves as a standardized framework for conducting and documenting model validation activities. It outlines the required sections, recommended analyses, and expected validation tests, ensuring consistency and completeness across validation reports. The template helps guide validators through a systematic review process while promoting comparability and traceability of validation outcomes.\n", + "\n", + "**Tests**: A function contained in the ValidMind Library, designed to run a specific quantitative test on the dataset or model. Tests are the building blocks of ValidMind, used to evaluate and document models and datasets.\n", + "\n", + "**Metrics**: A subset of tests that do not have thresholds. In the context of this notebook, metrics and tests can be thought of as interchangeable concepts.\n", + "\n", + "**Custom metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with the ValidMind Library to be used in the ValidMind Platform.\n", + "\n", + "**Inputs**: Objects to be evaluated and documented in the ValidMind Library. They can be any of the following:\n", + "\n", + " - **model**: A single model that has been initialized in ValidMind with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model).\n", + " - **dataset**: Single dataset that has been initialized in ValidMind with [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset).\n", + " - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.\n", + " - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. (Learn more: [Run tests with multiple datasets](https://docs.validmind.ai/notebooks/how_to/run_tests_that_require_multiple_datasets.html))\n", + "\n", + "**Parameters**: Additional arguments that can be passed when running a ValidMind test, used to pass additional information to a metric, customize its behavior, or provide additional context.\n", + "\n", + "**Outputs**: Custom metrics can return elements like tables or plots. Tables may be a list of dictionaries (each representing a row) or a pandas DataFrame. Plots may be matplotlib or plotly figures." + ] + }, + { + "cell_type": "markdown", + "id": "58b32cff", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Setting up" + ] + }, + { + "cell_type": "markdown", + "id": "e9be1dd1", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Register a sample model\n", + "\n", + "In a usual model lifecycle, a champion model will have been independently registered in your model inventory and submitted to you for validation by your model development team as part of the effective challenge process. (**Learn more:** [Submit for approval](https://docs.validmind.ai/guide/model-documentation/submit-for-approval.html))\n", + "\n", + "For this notebook, we'll have you register a dummy model in the ValidMind Platform inventory and assign yourself as the validator to familiarize you with the ValidMind interface and circumvent the need for an existing model:\n", + "\n", + "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n", + "\n", + "2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.\n", + "\n", + "3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))\n", + "\n", + " For example, to register a model for use with this notebook, select:\n", + "\n", + " - Documentation template: `Binary classification`\n", + " - Use case: `Marketing/Sales - Attrition/Churn Management`\n", + "\n", + " You can fill in other options according to your preference." + ] + }, + { + "cell_type": "markdown", + "id": "f7815e5f", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Assign validator credentials\n", + "\n", + "In order to log tests as a validator instead of as a developer, on the model details page that appears after you've successfully registered your sample model:\n", + "\n", + "1. Remove yourself as a model owner: \n", + "\n", + " - Click on the **OWNERS** tile.\n", + " - Click the **x** next to your name to remove yourself from that model's role.\n", + " - Click **Save** to apply your changes to that role.\n", + "\n", + "2. Remove yourself as a developer: \n", + "\n", + " - Click on the **DEVELOPERS** tile.\n", + " - Click the **x** next to your name to remove yourself from that model's role.\n", + " - Click **Save** to apply your changes to that role.\n", + "\n", + "3. Add yourself as a validator: \n", + "\n", + " - Click on the **VALIDATORS** tile.\n", + " - Select your name from the drop-down menu.\n", + " - Click **Save** to apply your changes to that role." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Install the ValidMind Library\n", + "\n", + "
Recommended Python versions\n", + "

\n", + "Python 3.8 <= x <= 3.11
\n", + "\n", + "To install the library:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "64eb485c", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -q validmind" + ] + }, + { + "cell_type": "markdown", + "id": "330c8f40", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind Library\n", + "\n", + "ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook." + ] + }, + { + "cell_type": "markdown", + "id": "3f04449a", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Get your code snippet\n", + "\n", + "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n", + "\n", + "2. In the left sidebar, navigate to **Inventory** and select the model you registered for this notebook.\n", + "\n", + "3. Go to **Getting Started** and click **Copy snippet to clipboard**.\n", + "\n", + "Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c6ce354", + "metadata": {}, + "outputs": [], + "source": [ + "# Load your model identifier credentials from an `.env` file\n", + "\n", + "%load_ext dotenv\n", + "%dotenv .env\n", + "\n", + "# Or replace with your code snippet\n", + "\n", + "import validmind as vm\n", + "\n", + "vm.init(\n", + " # api_host=\"...\",\n", + " # api_key=\"...\",\n", + " # api_secret=\"...\",\n", + " # model=\"...\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "b07758a3", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the Python environment\n", + "\n", + "Then, let's import the necessary libraries and set up your Python environment for data analysis by enabling **`matplotlib`**, a plotting library used for visualizing data.\n", + "\n", + "This ensures that any plots you generate will render inline in our notebook output rather than opening in a separate window:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1e53065d", + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "id": "ecc3d3a1", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Getting to know ValidMind" + ] + }, + { + "cell_type": "markdown", + "id": "ce709365", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Preview the validation report template\n", + "\n", + "Let's verify that you have connected the ValidMind Library to the ValidMind Platform and that the appropriate *template* is selected for model validation. A template predefines sections for your validation report and provides a general outline to follow, making the validation process much easier.\n", + "\n", + "You will attach evidence to this template in the form of risk assessment notes, findings, and test results later on. For now, **take a look at the default structure that the template provides with [the `vm.preview_template()` function](https://docs.validmind.ai/validmind/validmind.html#preview_template)** from the ValidMind library:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "be445598", + "metadata": {}, + "outputs": [], + "source": [ + "vm.preview_template()" + ] + }, + { + "cell_type": "markdown", + "id": "43966e46", + "metadata": {}, + "source": [ + "\n", + "\n", + "### View validation report in the ValidMind Platform\n", + "\n", + "Next, let's head to the ValidMind Platform to see the template in action:\n", + "\n", + "1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).\n", + "\n", + "2. In the left sidebar, navigate to **Inventory** and select the model you registered for this notebook.\n", + "\n", + "3. Click on the **Validation Report** for your model and note:\n", + "\n", + " - [x] The risk assessment compliance summary at the top of the report (screenshot below)\n", + " - [x] How the structure of the validation report reflects the previewed template\n", + "\n", + " \"Screenshot\n", + "

" + ] + }, + { + "cell_type": "markdown", + "id": "5cd4890e", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Importing the sample dataset" + ] + }, + { + "cell_type": "markdown", + "id": "6aeb700e", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Load the sample dataset\n", + "\n", + "First, let's import the public [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset from Kaggle, which was used to develop the dummy champion model.\n", + "\n", + "We'll use this dataset to review steps that should have been conducted during the initial development and documentation of the model to ensure that the model was built correctly. By independently performing steps taken by the model development team, we can confirm whether the model was built using appropriate and properly processed data.\n", + "\n", + "In our below example, note that:\n", + "\n", + "- The target column, `Exited` has a value of `1` when a customer has churned and `0` otherwise.\n", + "- The ValidMind Library provides a wrapper to automatically load the dataset as a [Pandas DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) object. A Pandas Dataframe is a two-dimensional tabular data structure that makes use of rows and columns." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "73076ee3", + "metadata": {}, + "outputs": [], + "source": [ + "from validmind.datasets.classification import customer_churn\n", + "\n", + "print(\n", + " f\"Loaded demo dataset with: \\n\\n\\t• Target column: '{customer_churn.target_column}' \\n\\t• Class labels: {customer_churn.class_labels}\"\n", + ")\n", + "\n", + "raw_df = customer_churn.load_data()\n", + "raw_df.head()" + ] + }, + { + "cell_type": "markdown", + "id": "6c1e567a", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Preprocess the raw dataset\n", + "\n", + "Let's say that thanks to the documentation submitted by the model development team ([Learn more ...](quickstart_model_documentation.ipynb)), we know that the sample dataset was first preprocessed before being used to train the champion model.\n", + "\n", + "During model validation, we use the same data processing logic and training procedure to confirm that the model's results can be reproduced independently, so let's also start by preprocessing our imported dataset to verify that preprocessing was done correctly. This involves splitting the data and separating the features (inputs) from the targets (outputs)." + ] + }, + { + "cell_type": "markdown", + "id": "4b98b04a", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Split the dataset\n", + "\n", + "Splitting our dataset helps assess how well the model generalizes to unseen data.\n", + "\n", + "Use [`preprocess()`](https://docs.validmind.ai/validmind/validmind/datasets/classification/customer_churn.html#preprocess) to split our dataset into three subsets:\n", + "\n", + "1. **train_df** — Used to train the model.\n", + "2. **validation_df** — Used to evaluate the model's performance during training.\n", + "3. **test_df** — Used later on to asses the model's performance on new, unseen data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ee8cfaaf", + "metadata": {}, + "outputs": [], + "source": [ + "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)" + ] + }, + { + "cell_type": "markdown", + "id": "f6549607", + "metadata": {}, + "source": [ + "\n", + "\n", + "#### Separate features and targets\n", + "\n", + "To train the model, we need to provide it with:\n", + "\n", + "1. **Inputs** — Features such as customer age, usage, etc.\n", + "2. **Outputs (Expected answers/labels)** — in our case, we would like to know whether the customer churned or not.\n", + "\n", + "Here, we'll use `x_train` to hold the input features, and `y_train` to hold the target variable — the values we want the model to predict:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6fe65be5", + "metadata": {}, + "outputs": [], + "source": [ + "x_train = train_df.drop(customer_churn.target_column, axis=1)\n", + "y_train = train_df[customer_churn.target_column]" + ] + }, + { + "cell_type": "markdown", + "id": "a467c9ce", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Running data quality tests\n", + "\n", + "With everything ready to go, let's explore some of ValidMind's available tests to help us assess the quality of our datasets. Using ValidMind’s repository of tests streamlines your validation testing, and helps you ensure that your models are being validated appropriately." + ] + }, + { + "cell_type": "markdown", + "id": "da09a71d", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Identify qualitative tests\n", + "\n", + "We want to narrow down the tests we want to run from the selection provided by ValidMind, so we'll use the [`vm.tests.list_tasks_and_tags()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tasks_and_tags) to list which `tags` are associated with each `task` type:\n", + "\n", + "- **`tasks`** represent the kind of modeling task associated with a test. Here we'll focus on `classification` tasks.\n", + "- **`tags`** are free-form descriptions providing more details about the test, for example, what category the test falls into. Here we'll focus on the `data_quality` tag." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "85bc2f85", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.list_tasks_and_tags()" + ] + }, + { + "cell_type": "markdown", + "id": "39f050b1", + "metadata": {}, + "source": [ + "Then we'll call [the `vm.tests.list_tests()` function](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to list all the data quality tests for classification:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31b31a51", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.list_tests(\n", + " tags=[\"data_quality\"], task=\"classification\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "d433c949", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind datasets\n", + "\n", + "Before you can run tests with your preprocessed datasets, you must first initialize a ValidMind `Dataset` object using the [`init_dataset`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) function from the ValidMind (`vm`) module. **This step is always necessary every time you want to connect a dataset to documentation and produce test results through ValidMind,** but you only need to do it once per dataset.\n", + "\n", + "For this example, we'll pass in the following arguments:\n", + "\n", + "- **`dataset`** — The raw dataset that you want to provide as input to tests.\n", + "- **`input_id`** — A unique identifier that allows tracking what inputs are used when running each individual test.\n", + "- **`target_column`** — A required argument if tests require access to true values. This is the name of the target column in the dataset.\n", + "- **`class_labels`** — An optional value to map predicted classes to class labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba677dd7", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the raw dataset\n", + "vm_raw_dataset = vm.init_dataset(\n", + " dataset=raw_df,\n", + " input_id=\"raw_dataset\",\n", + " target_column=customer_churn.target_column,\n", + " class_labels=customer_churn.class_labels,\n", + ")\n", + "\n", + "# Initialize the training dataset\n", + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=customer_churn.target_column,\n", + ")\n", + "\n", + "# Initialize the validation dataset\n", + "vm_validation_ds = vm.init_dataset(\n", + " dataset=validation_df,\n", + " input_id=\"validation_dataset\",\n", + " target_column=customer_churn.target_column,\n", + ")\n", + "\n", + "# Initialize the testing dataset\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=customer_churn.target_column\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "45ff3201", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run an individual data quality test\n", + "\n", + "Next, we'll use our previously initialized raw dataset (`vm_raw_dataset`) as input to run an individual test, then log the result to the ValidMind Platform.\n", + "\n", + "- You run validation tests by calling [the `run_test` function](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) provided by the `validmind.tests` module.\n", + "- Every test result returned by the `run_test()` function has a [`.log()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#TestResult.log) that can be used to send the test results to the ValidMind Platform.\n", + "\n", + "Here, we'll use the [`ClassImbalance` test](https://docs.validmind.ai/tests/data_validation/ClassImbalance.html) as an example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dcb9b017", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.run_test(\n", + " test_id=\"validmind.data_validation.ClassImbalance\",\n", + " inputs={\n", + " \"dataset\": vm_raw_dataset\n", + " }\n", + ").log()" + ] + }, + { + "cell_type": "markdown", + "id": "cc4829be", + "metadata": {}, + "source": [ + "
Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for some test IDs. \n", + "

\n", + "That's expected, as when we run validations tests the results logged need to be manually added to your report as part of your compliance assessment process within the ValidMind Platform. You'll continue to see this message throughout this notebook as we run and log more tests.
" + ] + }, + { + "cell_type": "markdown", + "id": "43411ece", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run data comparison tests\n", + "\n", + "\n", + "We can also use ValidMind to perform comparison tests between our datasets, again logging the results to the ValidMind Platform. Below, we'll perform two sets of comparison tests with a mix of our datasets and the same class imbalance test:\n", + "\n", + "- When running individual tests, you can use a custom **`result_id`** to tag the individual result with a unique identifier, appended to the `test_id` with a `:` separator.\n", + "- We can specify all the tests we'd ike to run in a dictionary called `test_config`, and we'll pass in an **`input_grid`** of individual test inputs to compare. In this case, we'll input our two datasets for comparison. Note here that the `input_grid` expects the `input_id` of the dataset as the value rather than the variable name we specified." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d53edde7", + "metadata": {}, + "outputs": [], + "source": [ + "# Individual test config with inputs specified\n", + "test_config = {\n", + " # Comparison between training and testing datasets to check if class balance is the same in both sets\n", + " \"validmind.data_validation.ClassImbalance:train_vs_validation\": {\n", + " \"input_grid\": {\"dataset\": [\"train_dataset\", \"validation_dataset\"]}\n", + " },\n", + " # Comparison between training and testing datasets to confirm that both sets have similar class distributions\n", + " \"validmind.data_validation.ClassImbalance:train_vs_test\": {\n", + " \"input_grid\": {\"dataset\": [\"train_dataset\", \"test_dataset\"]},\n", + " },\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "16132a57", + "metadata": {}, + "source": [ + "Then batch run and log our tests in `test_config`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1b97e404", + "metadata": {}, + "outputs": [], + "source": [ + "for t in test_config:\n", + " print(t)\n", + " try:\n", + " # Check if test has input_grid\n", + " if 'input_grid' in test_config[t]:\n", + " # For tests with input_grid, pass the input_grid configuration\n", + " if 'params' in test_config[t]:\n", + " vm.tests.run_test(t, input_grid=test_config[t]['input_grid'], params=test_config[t]['params']).log()\n", + " else:\n", + " vm.tests.run_test(t, input_grid=test_config[t]['input_grid']).log()\n", + " else:\n", + " # Original logic for regular inputs\n", + " if 'params' in test_config[t]:\n", + " vm.tests.run_test(t, inputs=test_config[t]['inputs'], params=test_config[t]['params']).log()\n", + " else:\n", + " vm.tests.run_test(t, inputs=test_config[t]['inputs']).log()\n", + " except Exception as e:\n", + " print(f\"Error running test {t}: {str(e)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bd8d4805", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Importing the champion model\n", + "\n", + "With our raw dataset preprocessed, let's go ahead and import the champion model submitted by the model development team in the format of a `.pkl` file: **[xgboost_model_champion.pkl](xgboost_model_champion.pkl)**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7f18188e", + "metadata": {}, + "outputs": [], + "source": [ + "# Import the champion model\n", + "import joblib\n", + "\n", + "xgboost = joblib.load(\"xgboost_model_champion.pkl\")" + ] + }, + { + "cell_type": "markdown", + "id": "9f858689", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize a model object\n", + "\n", + "In addition to the initialized datasets, you'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for our champion model.\n", + "\n", + "You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a799cf2", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the champion XGBoost model\n", + "vm_xgboost = vm.init_model(\n", + " xgboost,\n", + " input_id=\"xgboost_champion\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "20ef72d4", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Assign predictions\n", + "\n", + "Once the model has been registered, you can assign model predictions to the training and testing datasets.\n", + "\n", + "- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.\n", + "- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.\n", + "\n", + "If no prediction values are passed, the method will compute predictions automatically:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "71dd8e7b", + "metadata": {}, + "outputs": [], + "source": [ + "vm_train_ds.assign_predictions(\n", + " model=vm_xgboost,\n", + ")\n", + "\n", + "vm_test_ds.assign_predictions(\n", + " model=vm_xgboost,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "629964ef", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Running model evaluation tests\n", + "\n", + "With our setup complete, let's run the rest of our validation tests. Since we have already verified the data quality of the dataset used to train our champion model, we will now focus on evaluating the model's performance." + ] + }, + { + "cell_type": "markdown", + "id": "b3b40873", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run model performance tests\n", + "\n", + "First, let's run some performance tests. Use [`vm.tests.list_tests()`](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to identify all the model performance tests for classification:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "202792e8", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.list_tests(tags=[\"model_performance\"], task=\"classification\")" + ] + }, + { + "cell_type": "markdown", + "id": "0b49a782", + "metadata": {}, + "source": [ + "We'll isolate the specific tests we want to run in `mpt`, and append an identifier for our champion model here to the `result_id` with a `:` separator like we did above in another test:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9fc18843", + "metadata": {}, + "outputs": [], + "source": [ + "mpt = [\n", + " \"validmind.model_validation.sklearn.ClassifierPerformance:xgboost_champion\",\n", + " \"validmind.model_validation.sklearn.ConfusionMatrix:xgboost_champion\",\n", + " \"validmind.model_validation.sklearn.ROCCurve:xgboost_champion\"\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "78becfa1", + "metadata": {}, + "source": [ + "Now, let's run and log our batch of model performance tests using our testing dataset (`vm_test_ds`) for our champion model:\n", + "\n", + "- The test set serves as a proxy for real-world data, providing an unbiased estimate of model performance since it was not used during training or tuning.\n", + "- The test set also acts as protection against selection bias and model tweaking, giving a final, more unbiased checkpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6866b21c", + "metadata": {}, + "outputs": [], + "source": [ + "for test in mpt:\n", + " vm.tests.run_test(\n", + " test,\n", + " inputs={\n", + " \"dataset\": vm_test_ds, \"model\" : vm_xgboost,\n", + " },\n", + " ).log()" + ] + }, + { + "cell_type": "markdown", + "id": "2aae3040", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run diagnostic tests\n", + "\n", + "Next, we want to inspect the robustness and stability of our champion model. Use `list_tests()` to list all available diagnosis tests applicable to classification tasks:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9b3caa4", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.list_tests(tags=[\"model_diagnosis\"], task=\"classification\")" + ] + }, + { + "cell_type": "markdown", + "id": "ecefecac", + "metadata": {}, + "source": [ + "Let’s now assess the model for potential signs of *overfitting* and identify any sub-segments where performance may inconsistent.\n", + "\n", + "Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data:\n", + "\n", + "- Since the training dataset (`vm_train_ds`) was used to fit the model, we use this set to establish a baseline performance for how well the model performs on data it has already seen.\n", + "- The testing dataset (`vm_test_ds`) was never seen during training, and here simulates real-world generalization, or how well the model performs on new, unseen data. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "82f824f2", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.run_test(\n", + " test_id=\"validmind.model_validation.sklearn.OverfitDiagnosis:xgboost_champion\",\n", + " input_grid={\n", + " \"datasets\": [[vm_train_ds,vm_test_ds]],\n", + " \"model\" : [vm_xgboost]\n", + " }\n", + ").log()" + ] + }, + { + "cell_type": "markdown", + "id": "c7f7988e", + "metadata": {}, + "source": [ + "Let's also conduct *robustness* and *stability* tests.\n", + "\n", + "- Robustness evaluates the model’s ability to maintain consistent performance under varying input conditions.\n", + "- Stability assesses whether the model produces consistent outputs across different data subsets or over time.\n", + "\n", + "Again, we'll use both the training and testing datasets to establish baseline performance and to simulate real-world generalization:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2676197", + "metadata": {}, + "outputs": [], + "source": [ + "vm.tests.run_test(\n", + " test_id=\"validmind.model_validation.sklearn.RobustnessDiagnosis:xgboost_champion\",\n", + " input_grid={\n", + " \"datasets\": [[vm_train_ds,vm_test_ds]],\n", + " \"model\" : [vm_xgboost]\n", + " },\n", + ").log()" + ] + }, + { + "cell_type": "markdown", + "id": "0be88c10", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Run feature importance tests\n", + "\n", + "We also want to verify the relative influence of different input features on our model's predictions. Use `list_tests()` to identify all the feature importance tests for classification and store them in `FI`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c8c26e6", + "metadata": {}, + "outputs": [], + "source": [ + "# Store the feature importance tests\n", + "FI = vm.tests.list_tests(tags=[\"feature_importance\"], task=\"classification\",pretty=False)\n", + "FI" + ] + }, + { + "cell_type": "markdown", + "id": "31326bc8", + "metadata": {}, + "source": [ + "We'll only use our testing dataset (`vm_test_ds`) here, to provide a realistic, unseen sample that mimic future or production data, as the training dataset has already influenced our model during learning:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5a49f550", + "metadata": {}, + "outputs": [], + "source": [ + "# Run and log our feature importance tests with the testing dataset\n", + "for test in FI:\n", + " vm.tests.run_test(\n", + " \"\".join((test,':xgboost_champion')),\n", + " inputs={\n", + " \"dataset\": vm_test_ds, \"model\": vm_xgboost\n", + " },\n", + " ).log()" + ] + }, + { + "cell_type": "markdown", + "id": "19a0caa4", + "metadata": {}, + "source": [ + "\n", + "\n", + "## In summary\n", + "\n", + "In this notebook, you learned how to:\n", + "\n", + "- [x] Register a model within the ValidMind Platform\n", + "- [x] Install and initialize the ValidMind Library\n", + "- [x] Preview the validation report template for your model\n", + "- [x] Import a sample dataset and champion model\n", + "- [x] Initialize ValidMind datasets and model objects\n", + "- [x] Assign model predictions to your ValidMind model objects\n", + "- [x] Identify and run various validation tests\n", + "\n", + "In a usual model validation workflow, you would wrap up your validation testing by verifying that all the tests provided by the model development team were run and reported accurately, and perhaps even propose a challenger model, comparing the performance of the challenger with the running champion.\n", + "\n", + "
With ValidMind, you can easily:\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "ed4bc468", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Next steps\n", + "\n", + "You can look at the output produced by the ValidMind Library right in the notebook where you ran the code, as you would expect. But there is a better way — use the ValidMind Platform to work with your validation report." + ] + }, + { + "cell_type": "markdown", + "id": "3836cbc2", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Work with your validation report\n", + "\n", + "1. From the **Inventory** in the ValidMind Platform, go to the model you registered earlier. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/working-with-model-inventory.html))\n", + "\n", + "2. In the left sidebar that appears for your model, click **Validation Report**.\n", + "\n", + " From here, you can link test results as evidence, add model findings, assess compliance, and submit your validation report for approval when it's ready. [Learn more ...](https://docs.validmind.ai/guide/model-validation/preparing-validation-reports.html)" + ] + }, + { + "cell_type": "markdown", + "id": "8bdf85fe", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Discover more learning resources\n", + "\n", + "For a more in-depth introduction to using the ValidMind Library for validation, check out our introductory validation series and the accompanying interactive training:\n", + "\n", + "- **[ValidMind for model validation](https://docs.validmind.ai/developer/validmind-library.html#for-model-validation)**\n", + "- **[Validator Fundamentals](https://docs.validmind.ai/training/validator-fundamentals/validator-fundamentals-register.html)**\n", + "\n", + "We offer many interactive notebooks to help you validate models:\n", + "\n", + "- [Run tests & test suites](https://docs.validmind.ai/guide/testing-overview.html)\n", + "- [Code samples](https://docs.validmind.ai/guide/samples-jupyter-notebooks.html)\n", + "\n", + "Or, visit our [documentation](https://docs.validmind.ai/) to learn more about ValidMind." + ] + }, + { + "cell_type": "markdown", + "id": "383cd893", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Upgrade ValidMind\n", + "\n", + "
After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.
\n", + "\n", + "Retrieve the information for the currently installed version of ValidMind:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "upgrade-show-c0a446ff-f26f-4ad0-839a-e92927711798", + "metadata": {}, + "outputs": [], + "source": [ + "%pip show validmind" + ] + }, + { + "cell_type": "markdown", + "id": "upgrade-version-098b75d8-4380-4cc0-ac06-da363a3cf5a0", + "metadata": {}, + "source": [ + "If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:\n", + "\n", + "```bash\n", + "%pip install --upgrade validmind\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "upgrade-restart-aebde628-1c17-4f70-88c8-d4561e803abe", + "metadata": {}, + "source": [ + "You may need to restart your kernel after running the upgrade package for changes to be applied." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "ValidMind Library", + "language": "python", + "name": "validmind" + }, + "language_info": { + "name": "python", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/quickstart/xgboost_model_champion.pkl b/notebooks/quickstart/xgboost_model_champion.pkl new file mode 100644 index 000000000..e6fdef9b0 Binary files /dev/null and b/notebooks/quickstart/xgboost_model_champion.pkl differ diff --git a/notebooks/tutorials/model_development/4-finalize_testing_documentation.ipynb b/notebooks/tutorials/model_development/4-finalize_testing_documentation.ipynb index e38978aec..58e401692 100644 --- a/notebooks/tutorials/model_development/4-finalize_testing_documentation.ipynb +++ b/notebooks/tutorials/model_development/4-finalize_testing_documentation.ipynb @@ -896,7 +896,19 @@ "\n", "### Work with your model documentation\n", "\n", - "Now that you've logged all your test results and generated a draft for your model documentation, head to the ValidMind Platform to make qualitative edits, view guidelines, collaborate with validators, and submit your model documentation for approval when it's ready. **Learn more:** [Working with model documentation](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html)" + "Now that you've logged all your test results and generated a draft for your model documentation, head to the ValidMind Platform to wrap up your model documentation. Continue to work on your model documentation by:\n", + "\n", + "- **Run and log more tests:** Use the skills you learned in this series of notebooks to run and log more individual tests, including custom tests, then insert them into your documentation as supplementary evidence. (Learn more: [`validmind.tests`](https://docs.validmind.ai/validmind/validmind/tests.html))\n", + "\n", + "- **Inserting additional test results:** Add **Test-Driven Blocks** under any relevant section of your model documentation. (Learn more: [Work with test results](https://docs.validmind.ai/guide/model-documentation/work-with-test-results.html))\n", + "\n", + "- **Making qualitative edits to your test descriptions:** Click on the description of any inserted test results to review and edit the ValidMind-generated test descriptions for quality and accuracy. (Learn more: [Working with model documentation](https://docs.validmind.ai/guide/model-documentation/working-with-model-documentation.html#add-or-edit-documentation))\n", + "\n", + "- **View guidelines:** In any section of your model documentation, click **​ValidMind Insights** in the top right corner to reveal the Documentation Guidelines for each section to help guide the contents of your model documentation. (Learn more: [View documentation guidelines](https://docs.validmind.ai/guide/model-documentation/view-documentation-guidelines.html))\n", + "\n", + "- **Collaborate with other stakeholders:** Use the ValidMind Platform's real-time collaborative features to work seamlessly together with the rest of your organization, including model validators. Review suggested changes in your content blocks, work with versioned history, and use comments to discuss specific portions of your model documentation. (Learn more: [Collaborate with others](https://docs.validmind.ai/guide/model-documentation/collaborate-with-others.html))\n", + "\n", + "When your model documentation is complete and ready for review, submit it for approval from the same ValidMind Platform where you made your edits and collaborated with the rest of your organization, ensuring transparency and a thorough model development history. (Learn more: [Submit for approval](https://docs.validmind.ai/guide/model-documentation/submit-for-approval.html))" ] }, { diff --git a/notebooks/tutorials/model_validation/2-start_validation_process.ipynb b/notebooks/tutorials/model_validation/2-start_validation_process.ipynb index 93812fc21..747a5dd1e 100644 --- a/notebooks/tutorials/model_validation/2-start_validation_process.ipynb +++ b/notebooks/tutorials/model_validation/2-start_validation_process.ipynb @@ -564,7 +564,7 @@ "\n", "## Documenting test results\n", "\n", - "Now that we've done some analysis on two different datasets, we can use ValidMind to easily document why certain things were done to our raw data with testing to support it. As we learned above, every test result returned by the `run_test()` function has a `.log()` method that can be used to send the test results to the ValidMind Platform.\n", + "Now that we've done some analysis on two different datasets, we can use ValidMind to easily document why certain things were done to our raw data with testing to support it. Every test result returned by the `run_test()` function has a `.log()` method that can be used to send the test results to the ValidMind Platform.\n", "\n", "When logging validation test results to the platform, you'll need to manually add those results to the desired section of the validation report. To demonstrate how to add test results to your validation report, we'll log our data quality tests and insert the results via the ValidMind Platform." ] diff --git a/notebooks/tutorials/model_validation/3-developing_challenger_model.ipynb b/notebooks/tutorials/model_validation/3-developing_challenger_model.ipynb index ba6958f28..83061a827 100644 --- a/notebooks/tutorials/model_validation/3-developing_challenger_model.ipynb +++ b/notebooks/tutorials/model_validation/3-developing_challenger_model.ipynb @@ -520,7 +520,7 @@ "\n", "## Running model evaluation tests\n", "\n", - "With everything ready for us, let's run the rest of our validation tests. We'll focus on comprehensive testing around model performance of both the champion and challenger models going forward as we've already verified the data quality of the datasets used to train the champion model." + "With our setup complete, let's run the rest of our validation tests. Since we have already verified the data quality of the dataset used to train our champion model, we will now focus on comprehensive performance evaluations of both the champion and challenger models." ] }, { @@ -584,7 +584,10 @@ "\n", "#### Evaluate performance of the champion model\n", "\n", - "Now, let's run and log our batch of model performance tests using our testing dataset (`vm_test_ds`) for our champion model:" + "Now, let's run and log our batch of model performance tests using our testing dataset (`vm_test_ds`) for our champion model:\n", + "\n", + "- The test set serves as a proxy for real-world data, providing an unbiased estimate of model performance since it was not used during training or tuning.\n", + "- The test set also acts as protection against selection bias and model tweaking, giving a final, more unbiased checkpoint." ] }, { @@ -725,9 +728,9 @@ "\n", "### Run diagnostic tests\n", "\n", - "Next we want to inspect the robustness and stability testing comparison between our champion and challenger model.\n", + "Next, we want to inspect the robustness and stability testing comparison between our champion and challenger model.\n", "\n", - "Use `list_tests()` to identify all the model diagnosis tests for classification:" + "Use `list_tests()` to list all available diagnosis tests applicable to classification tasks:" ] }, { @@ -743,9 +746,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's see if models suffer from any *overfit* potentials and also where there are potential sub-segments of issues with the [`OverfitDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/OverfitDiagnosis.html). \n", + "Let’s now assess the models for potential signs of *overfitting* and identify any sub-segments where performance may inconsistent with the [`OverfitDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/OverfitDiagnosis.html).\n", + "\n", + "Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data:\n", "\n", - "Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data." + "- Since the training dataset (`vm_train_ds`) was used to fit the model, we use this set to establish a baseline performance for how well the model performs on data it has already seen.\n", + "- The testing dataset (`vm_test_ds`) was never seen during training, and here simulates real-world generalization, or how well the model performs on new, unseen data. " ] }, { @@ -767,9 +773,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's also conduct *robustness* and *stability* testing of the two models with the [`RobustnessDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/RobustnessDiagnosis.html).\n", + "Let's also conduct *robustness* and *stability* testing of the two models with the [`RobustnessDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/RobustnessDiagnosis.html). Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets.\n", "\n", - "Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets." + "Again, we'll use both the training and testing datasets to establish baseline performance and to simulate real-world generalization:" ] }, { @@ -811,6 +817,13 @@ "FI" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll only use our testing dataset (`vm_test_ds`) here, to provide a realistic, unseen sample that mimic future or production data, as the training dataset has already influenced our model during learning:" + ] + }, { "cell_type": "code", "execution_count": null, diff --git a/notebooks/tutorials/model_validation/4-finalize_validation_reporting.ipynb b/notebooks/tutorials/model_validation/4-finalize_validation_reporting.ipynb index 886f9e906..ffdc4522a 100644 --- a/notebooks/tutorials/model_validation/4-finalize_validation_reporting.ipynb +++ b/notebooks/tutorials/model_validation/4-finalize_validation_reporting.ipynb @@ -1144,13 +1144,17 @@ "\n", "- **Inserting additional test results:** Click **Link Evidence to Report** under any section of 2. Validation in your validation report. (Learn more: [Link evidence to reports](https://docs.validmind.ai/guide/model-validation/assess-compliance.html#link-evidence-to-reports))\n", "\n", - "- **Making qualitative edits to your test descriptions:** Expand any linked evidence under Validator Evidence and click **See evidence details** to review and edit the ValidMind-generated test descriptions for quality and accuracy.\n", + "- **Making qualitative edits to your test descriptions:** Expand any linked evidence under Validator Evidence and click **See evidence details** to review and edit the ValidMind-generated test descriptions for quality and accuracy. (Learn more: [Preparing validation reports](https://docs.validmind.ai/guide/model-validation/preparing-validation-reports.html#get-started))\n", "\n", "- **Adding more findings:** Click **Link Finding to Report** in any validation report section, then click **+ Create New Finding**. (Learn more: [Add and manage model findings](https://docs.validmind.ai/guide/model-validation/add-manage-model-findings.html))\n", "\n", - "- **Adding risk assessment notes:** Click under **Risk Assessment Notes** in any validation report section to access the text editor and content editing toolbar, including an option to generate a draft with AI. Edit your ValidMind-generated test descriptions (Learn more: [Work with content blocks](https://docs.validmind.ai/guide/model-documentation/work-with-content-blocks.html#content-editing-toolbar))\n", + "- **Adding risk assessment notes:** Click under **Risk Assessment Notes** in any validation report section to access the text editor and content editing toolbar, including an option to generate a draft with AI. Once generated, edit your ValidMind-generated test descriptions to adhere to your organization's requirements. (Learn more: [Work with content blocks](https://docs.validmind.ai/guide/model-documentation/work-with-content-blocks.html#content-editing-toolbar))\n", "\n", - "- **Assessing compliance:** Under the Guideline for any validation report section, click **ASSESSMENT** and select the compliance status from the drop-down menu. (Learn more: [Provide compliance assessments](https://docs.validmind.ai/guide/model-validation/assess-compliance.html#provide-compliance-assessments))" + "- **Assessing compliance:** Under the Guideline for any validation report section, click **ASSESSMENT** and select the compliance status from the drop-down menu. (Learn more: [Provide compliance assessments](https://docs.validmind.ai/guide/model-validation/assess-compliance.html#provide-compliance-assessments))\n", + "\n", + "- **Collaborate with other stakeholders:** Use the ValidMind Platform's real-time collaborative features to work seamlessly together with the rest of your organization, including model developers. Propose suggested changes in the model documentation, work with versioned history, and use comments to discuss specific portions of the model documentation. (Learn more: [Collaborate with others](https://docs.validmind.ai/guide/model-documentation/collaborate-with-others.html))\n", + "\n", + "When your validation report is complete and ready for review, submit it for approval from the same ValidMind Platform where you made your edits and collaborated with the rest of your organization, ensuring transparency and a thorough model validation history. (Learn more: [Submit for approval](https://docs.validmind.ai/guide/model-documentation/submit-for-approval.html))" ] }, {