diff --git a/site/developer/model-testing/testing-overview.qmd b/site/developer/model-testing/testing-overview.qmd index 99666a993f..7ea340428e 100644 --- a/site/developer/model-testing/testing-overview.qmd +++ b/site/developer/model-testing/testing-overview.qmd @@ -6,6 +6,7 @@ aliases: listing: - id: tests-beginner type: grid + grid-columns: 2 max-description-length: 250 sort: false fields: [title, description] @@ -28,7 +29,7 @@ listing: max-description-length: 250 sort: false fields: [title, description] - contents: + contents: - ../../notebooks/code_samples/custom_tests/integrate_external_test_providers.ipynb - ../../notebooks/how_to/configure_dataset_features.ipynb - ../../notebooks/how_to/run_documentation_sections.ipynb @@ -43,6 +44,7 @@ listing: contents: - ../../notebooks/how_to/document_multiple_results_for_the_same_test.ipynb - ../../notebooks/how_to/load_datasets_predictions.ipynb + - ../../notebooks/how_to/understand_utilize_rawdata.ipynb - id: tests-custom type: grid max-description-length: 250 diff --git a/site/faq/faq-testing.qmd b/site/faq/faq-testing.qmd index 5f1e0db840..3f4f195bc3 100644 --- a/site/faq/faq-testing.qmd +++ b/site/faq/faq-testing.qmd @@ -37,10 +37,10 @@ Yes, {{< var vm.product >}} allows tests to be manipulated at several levels: - You can configure which tests are required to run programmatically depending on the model use case.[^4] - You can change the thresholds and parameters for default tests already available in the {{< var vm.developer >}} — for instance, changing the threshold parameter for the class imbalance flag.[^5] - You can also connect your own custom tests with the {{< var validmind.developer >}}. These custom tests are configurable and are able to run programmatically, just like the rest of the {{< var vm.developer >}}.[^6] +- Personalize tests further for your use case by using {{< var vm.product >}}'s `RawData` feature[^7] to customize the output of tests. ::: {.callout} -In addition to custom tests, you can also add use case and test-specific context for any test to enhance the LLM-generated test descriptions using the {{< var validmind.developer >}}.[^7] - +In addition to custom tests, you can also add use case and test-specific context for any test to enhance the LLM-generated test descriptions using the {{< var validmind.developer >}}.[^8] ::: {{< include _faq-explainability.qmd >}} @@ -69,4 +69,6 @@ In addition to custom tests, you can also add use case and test-specific context [^6]: [Can I use my own tests?](/developer/model-testing/testing-overview.qmd#can-i-use-my-own-tests) -[^7]: [Add context to LLM-generated test descriptions](/notebooks/how_to/add_context_to_llm_descriptions.ipynb) \ No newline at end of file +[^7]: [Understand and utilize `RawData` in {{< var vm.product >}} tests](/notebooks/how_to/understand_utilize_rawdata.ipynb) + +[^8]: [Add context to LLM-generated test descriptions](/notebooks/how_to/add_context_to_llm_descriptions.ipynb) \ No newline at end of file diff --git a/site/notebooks.zip b/site/notebooks.zip index 115a60ad76..06efc737df 100644 Binary files a/site/notebooks.zip and b/site/notebooks.zip differ diff --git a/site/notebooks/how_to/understand_utilize_rawdata.ipynb b/site/notebooks/how_to/understand_utilize_rawdata.ipynb new file mode 100644 index 0000000000..7354eae858 --- /dev/null +++ b/site/notebooks/how_to/understand_utilize_rawdata.ipynb @@ -0,0 +1,571 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "c18ba8a2", + "metadata": {}, + "source": [ + "# Understand and utilize `RawData` in ValidMind tests\n", + "\n", + "Test functions in ValidMind can return a special object called *`RawData`*, which holds intermediate or unprocessed data produced somewhere in the test logic but not returned as part of the test's visible output, such as in tables or figures.\n", + "\n", + "- The `RawData` feature allows you to customize the output of tests, making it a powerful tool for creating custom tests and post-processing functions.\n", + "- `RawData` is useful when running post-processing functions with tests to recompute tabular outputs, redraw figures, or even create new outputs entirely.\n", + "\n", + "In this notebook, you'll learn how to access, inspect, and utilize `RawData` from ValidMind tests." + ] + }, + { + "cell_type": "markdown", + "id": "5b5b248c", + "metadata": {}, + "source": [ + "::: {.content-hidden when-format=\"html\"}\n", + "## Contents \n", + "- [Setup](#toc1_) \n", + " - [Installation and intialization](#toc1_1_) \n", + " - [Load the sample dataset](#toc1_2_) \n", + " - [Initialize the ValidMind objects](#toc1_3_) \n", + "- [`RawData` usage examples](#toc2_) \n", + " - [Using `RawData` from the ROC Curve Test](#toc2_1_) \n", + " - [Pearson Correlation Matrix](#toc2_2_) \n", + " - [Precision-Recall Curve](#toc2_3_) \n", + " - [Using `RawData` in custom tests](#toc2_4_) \n", + "\n", + ":::\n", + "\n", + "" + ] + }, + { + "cell_type": "markdown", + "id": "6dd79a98", + "metadata": {}, + "source": [ + "\n", + "\n", + "## Setup\n", + "\n", + "Before we can run our examples, we'll need to set the stage to enable running tests with the ValidMind Library. Since the focus of this notebook is on the `RawData` object, this section will merely summarize the steps instead of going into greater detail. \n", + "\n", + "\n", + "**To learn more about running tests with ValidMind:** [Run tests and test suites](https://docs.validmind.ai/developer/model-testing/testing-overview.html)" + ] + }, + { + "cell_type": "markdown", + "id": "5b6d8d15", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Installation and intialization\n", + "\n", + "First, let's make sure that the ValidMind Library is installed and ready to go, and our Python environment set up for data analysis:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04eb084e", + "metadata": {}, + "outputs": [], + "source": [ + "# Install the ValidMind Library\n", + "%pip install -q validmind\n", + "\n", + "# Initialize the ValidMind Library\n", + "import validmind as vm\n", + "\n", + "# Import the `xgboost` library with an alias\n", + "import xgboost as xgb\n" + ] + }, + { + "cell_type": "markdown", + "id": "5e6aa2cb", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Load the sample dataset\n", + "\n", + "Then, we'll import a sample ValidMind dataset and preprocess it:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "50d72eba", + "metadata": {}, + "outputs": [], + "source": [ + "# Import the `customer_churn` sample dataset\n", + "from validmind.datasets.classification import customer_churn\n", + "raw_df = customer_churn.load_data()\n", + "\n", + "# Preprocess the raw dataset\n", + "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)\n", + "\n", + "# Separate features and targets\n", + "x_train = train_df.drop(customer_churn.target_column, axis=1)\n", + "y_train = train_df[customer_churn.target_column]\n", + "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n", + "y_val = validation_df[customer_churn.target_column]\n", + "\n", + "# Create an `XGBClassifier` object\n", + "model = xgb.XGBClassifier(early_stopping_rounds=10)\n", + "model.set_params(\n", + " eval_metric=[\"error\", \"logloss\", \"auc\"],\n", + ")\n", + "\n", + "# Train the model using the validation set\n", + "model.fit(\n", + " x_train,\n", + " y_train,\n", + " eval_set=[(x_val, y_val)],\n", + " verbose=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e3895d35", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Initialize the ValidMind objects" + ] + }, + { + "cell_type": "markdown", + "id": "c0e441f4", + "metadata": {}, + "source": [ + "Before you can run tests, you'll need to initialize a ValidMind dataset object, as well as a ValidMind model object that can be passed to other functions for analysis and tests on the data:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2310bc4", + "metadata": {}, + "outputs": [], + "source": [ + "# Initialize the dataset object\n", + "vm_raw_dataset = vm.init_dataset(\n", + " dataset=raw_df,\n", + " input_id=\"raw_dataset\",\n", + " target_column=customer_churn.target_column,\n", + " class_labels=customer_churn.class_labels,\n", + " __log=False,\n", + ")\n", + "\n", + "# Initialize the datasets into their own dataset objects\n", + "vm_train_ds = vm.init_dataset(\n", + " dataset=train_df,\n", + " input_id=\"train_dataset\",\n", + " target_column=customer_churn.target_column,\n", + " __log=False,\n", + ")\n", + "vm_test_ds = vm.init_dataset(\n", + " dataset=test_df,\n", + " input_id=\"test_dataset\",\n", + " target_column=customer_churn.target_column,\n", + " __log=False,\n", + ")\n", + "\n", + "# Initialize a model object\n", + "vm_model = vm.init_model(\n", + " model,\n", + " input_id=\"model\",\n", + " __log=False,\n", + ")\n", + "\n", + "# Assign predictions to the datasets\n", + "vm_train_ds.assign_predictions(\n", + " model=vm_model,\n", + ")\n", + "\n", + "vm_test_ds.assign_predictions(\n", + " model=vm_model,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "25ec99fc", + "metadata": {}, + "source": [ + "\n", + "\n", + "## `RawData` usage examples\n", + "\n", + "Once you're set up to run tests, you can then try out the following examples:\n", + "\n", + " - [Using `RawData` from the ROC Curve Test](#toc2_1_) \n", + " - [Pearson Correlation Matrix](#toc2_2_) \n", + " - [Precision-Recall Curve](#toc2_3_) \n", + " - [Using `RawData` in custom tests](#toc2_4_) " + ] + }, + { + "cell_type": "markdown", + "id": "33d79841", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Using `RawData` from the ROC Curve Test\n", + "\n", + "In this introductory example, we run the [ROC Curve](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) test, inspect its `RawData` output, and then create a custom ROC curve using the raw data values.\n", + "\n", + "First, let's run the default ROC Curve test for comparsion with later iterations:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "58a3a779", + "metadata": {}, + "outputs": [], + "source": [ + "from validmind.tests import run_test\n", + "\n", + "# Run the ROC Curve test normally\n", + "result_roc = run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + " generate_description=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "66c44fe0", + "metadata": {}, + "source": [ + "Now let's assume we want to create a custom version of the above figure. First, let's inspect the raw data that this test produces so we can see what we have to work with.\n", + "\n", + "`RawData` objects have a `inspect()` method that will pretty print the attributes of the object to be able to quickly see the data and its types:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "513ce01e", + "metadata": {}, + "outputs": [], + "source": [ + "# Inspect the RawData output from the ROC test\n", + "print(\"RawData from ROC Curve Test:\")\n", + "result_roc.raw_data.inspect()" + ] + }, + { + "cell_type": "markdown", + "id": "586f3a12", + "metadata": {}, + "source": [ + "As we can see, the ROC Curve returns a `RawData` object with the following attributes:\n", + "- **`fpr`:** A list of false positive rates\n", + "- **`tpr`:** A list of true positive rates\n", + "- **`auc`:** The area under the curve\n", + "\n", + "This should be enough to create our own custom ROC curve via a post-processing function without having to create a whole new test from scratch and without having to recompute any of the data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "613778d2", + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "\n", + "from validmind.vm_models.result import TestResult\n", + "\n", + "\n", + "def custom_roc_curve(result: TestResult):\n", + " # Extract raw data from the test result\n", + " fpr = result.raw_data.fpr\n", + " tpr = result.raw_data.tpr\n", + " auc = result.raw_data.auc\n", + "\n", + " # Create a custom ROC curve plot\n", + " fig = plt.figure()\n", + " plt.plot(fpr, tpr, label=f\"Custom ROC (AUC = {auc:.2f})\", color=\"blue\")\n", + " plt.plot([0, 1], [0, 1], linestyle=\"--\", color=\"gray\", label=\"Random Guess\")\n", + " plt.xlabel(\"False Positive Rate\")\n", + " plt.ylabel(\"True Positive Rate\")\n", + " plt.title(\"Custom ROC Curve from RawData\")\n", + " plt.legend()\n", + "\n", + " # close the plot to avoid it automatically being shown in the notebook\n", + " plt.close()\n", + "\n", + " # remove existing figure\n", + " result.remove_figure(0)\n", + "\n", + " # add new figure\n", + " result.add_figure(fig)\n", + "\n", + " return result\n", + "\n", + "# test it on the existing result\n", + "modified_result = custom_roc_curve(result_roc)\n", + "\n", + "# show the modified result\n", + "modified_result.show()" + ] + }, + { + "cell_type": "markdown", + "id": "794d026c", + "metadata": {}, + "source": [ + "Now that we have created a post-processing function and verified that it works on our existing test result, we can use it directly in `run_test()` from now on:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c7566f3", + "metadata": {}, + "outputs": [], + "source": [ + "result = run_test(\n", + " \"validmind.model_validation.sklearn.ROCCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + " post_process_fn=custom_roc_curve,\n", + " generate_description=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "1d0b94aa", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Pearson Correlation Matrix\n", + "\n", + "In this next example, try commenting out the `post_process_fn` argument in the following cell and see what happens between different runs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c57fb01b", + "metadata": {}, + "outputs": [], + "source": [ + "import plotly.graph_objects as go\n", + "\n", + "\n", + "def custom_heatmap(result: TestResult):\n", + " corr_matrix = result.raw_data.correlation_matrix\n", + "\n", + " heatmap = go.Heatmap(\n", + " z=corr_matrix.values,\n", + " x=list(corr_matrix.columns),\n", + " y=list(corr_matrix.index),\n", + " colorscale=\"Viridis\",\n", + " )\n", + " fig = go.Figure(data=[heatmap])\n", + " fig.update_layout(title=\"Custom Heatmap from RawData\")\n", + "\n", + " plt.close()\n", + "\n", + " result.remove_figure(0)\n", + " result.add_figure(fig)\n", + "\n", + " return result\n", + "\n", + "\n", + "result_corr = run_test(\n", + " \"validmind.data_validation.PearsonCorrelationMatrix\",\n", + " inputs={\"dataset\": vm_test_ds},\n", + " generate_description=False,\n", + " # COMMENT OUT `post_process_fn`\n", + " post_process_fn=custom_heatmap,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "0a7cbbc6", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Precision-Recall Curve\n", + "\n", + "Then, let's try the same thing with the [Precision-Recall Curve](https://docs.validmind.ai/tests/model_validation/sklearn/PrecisionRecallCurve.html) test:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d16c5209", + "metadata": {}, + "outputs": [], + "source": [ + "def custom_pr_curve(result: TestResult):\n", + " precision = result.raw_data.precision\n", + " recall = result.raw_data.recall\n", + "\n", + " fig = plt.figure()\n", + " plt.plot(recall, precision, label=\"Precision-Recall Curve\")\n", + " plt.xlabel(\"Recall\")\n", + " plt.ylabel(\"Precision\")\n", + " plt.title(\"Custom Precision-Recall Curve from RawData\")\n", + " plt.legend()\n", + "\n", + " plt.close()\n", + " result.remove_figure(0)\n", + " result.add_figure(fig)\n", + "\n", + " return result\n", + "\n", + "result_pr = run_test(\n", + " \"validmind.model_validation.sklearn.PrecisionRecallCurve\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + " generate_description=False,\n", + " # COMMENT OUT `post_process_fn`\n", + " post_process_fn=custom_pr_curve,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e25391a4", + "metadata": {}, + "source": [ + "\n", + "\n", + "### Using `RawData` in custom tests\n", + "\n", + "These examples demonstrate some very simple ways to use the `RawData` feature of ValidMind tests. The majority of ValidMind-developed tests return some form of raw data that can be used to customize the output of the test, but you can also create your own tests that return `RawData` objects and use them in the same way.\n", + "\n", + "Let's take a look at how this can be done in custom tests. To start, define and run your custom test:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc6a389f", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "from validmind import test, RawData\n", + "from validmind.vm_models import VMDataset, VMModel\n", + "\n", + "\n", + "@test(\"custom.MyCustomTest\")\n", + "def MyCustomTest(dataset: VMDataset, model: VMModel) -> tuple[go.Figure, RawData]:\n", + " \"\"\"Custom test that produces a figure and a RawData object\"\"\"\n", + " # pretend we are using the dataset and model to compute some data\n", + " # ...\n", + "\n", + " # create some fake data that will be used to generate a figure\n", + " data = pd.DataFrame({\"x\": [10, 20, 30, 40, 50], \"y\": [10, 20, 30, 40, 50]})\n", + "\n", + " # create the figure (scatter plot)\n", + " fig = go.Figure(data=go.Scatter(x=data[\"x\"], y=data[\"y\"]))\n", + "\n", + " # now let's create a RawData object that holds the \"computed\" data\n", + " raw_data = RawData(scatter_data_df=data)\n", + "\n", + " # finally, return both the figure and the raw data\n", + " return fig, raw_data\n", + "\n", + "\n", + "my_result = run_test(\n", + " \"custom.MyCustomTest\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + " generate_description=False,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "854c219c", + "metadata": {}, + "source": [ + "We can see that the test result shows the figure. But since we returned a `RawData` object, we can also inspect the contents and see how we could use it to customize or regenerate the figure in the post-processing function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "1cb661d1", + "metadata": {}, + "outputs": [], + "source": [ + "my_result.raw_data.inspect()" + ] + }, + { + "cell_type": "markdown", + "id": "55ad4acd", + "metadata": {}, + "source": [ + "We can see that we get a nicely-formatted preview of the dataframe we stored in the raw data object. Let's go ahead and use it to re-plot our data:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1242083", + "metadata": {}, + "outputs": [], + "source": [ + "def custom_plot(result: TestResult):\n", + " data = result.raw_data.scatter_data_df\n", + "\n", + " # use something other than a scatter plot\n", + " fig = go.Figure(data=go.Bar(x=data[\"x\"], y=data[\"y\"]))\n", + " fig.update_layout(title=\"Custom Bar Chart from RawData\")\n", + " fig.update_xaxes(title=\"X Axis\")\n", + " fig.update_yaxes(title=\"Y Axis\")\n", + "\n", + " result.remove_figure(0)\n", + " result.add_figure(fig)\n", + "\n", + " return result\n", + "\n", + "result = run_test(\n", + " \"custom.MyCustomTest\",\n", + " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n", + " post_process_fn=custom_plot,\n", + " generate_description=False,\n", + ")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/site/python-docs.zip b/site/python-docs.zip index 127abb3922..67d109e059 100644 Binary files a/site/python-docs.zip and b/site/python-docs.zip differ