diff --git a/site/developer/model-testing/testing-overview.qmd b/site/developer/model-testing/testing-overview.qmd
index 99666a993f..7ea340428e 100644
--- a/site/developer/model-testing/testing-overview.qmd
+++ b/site/developer/model-testing/testing-overview.qmd
@@ -6,6 +6,7 @@ aliases:
listing:
- id: tests-beginner
type: grid
+ grid-columns: 2
max-description-length: 250
sort: false
fields: [title, description]
@@ -28,7 +29,7 @@ listing:
max-description-length: 250
sort: false
fields: [title, description]
- contents:
+ contents:
- ../../notebooks/code_samples/custom_tests/integrate_external_test_providers.ipynb
- ../../notebooks/how_to/configure_dataset_features.ipynb
- ../../notebooks/how_to/run_documentation_sections.ipynb
@@ -43,6 +44,7 @@ listing:
contents:
- ../../notebooks/how_to/document_multiple_results_for_the_same_test.ipynb
- ../../notebooks/how_to/load_datasets_predictions.ipynb
+ - ../../notebooks/how_to/understand_utilize_rawdata.ipynb
- id: tests-custom
type: grid
max-description-length: 250
diff --git a/site/faq/faq-testing.qmd b/site/faq/faq-testing.qmd
index 5f1e0db840..3f4f195bc3 100644
--- a/site/faq/faq-testing.qmd
+++ b/site/faq/faq-testing.qmd
@@ -37,10 +37,10 @@ Yes, {{< var vm.product >}} allows tests to be manipulated at several levels:
- You can configure which tests are required to run programmatically depending on the model use case.[^4]
- You can change the thresholds and parameters for default tests already available in the {{< var vm.developer >}} — for instance, changing the threshold parameter for the class imbalance flag.[^5]
- You can also connect your own custom tests with the {{< var validmind.developer >}}. These custom tests are configurable and are able to run programmatically, just like the rest of the {{< var vm.developer >}}.[^6]
+- Personalize tests further for your use case by using {{< var vm.product >}}'s `RawData` feature[^7] to customize the output of tests.
::: {.callout}
-In addition to custom tests, you can also add use case and test-specific context for any test to enhance the LLM-generated test descriptions using the {{< var validmind.developer >}}.[^7]
-
+In addition to custom tests, you can also add use case and test-specific context for any test to enhance the LLM-generated test descriptions using the {{< var validmind.developer >}}.[^8]
:::
{{< include _faq-explainability.qmd >}}
@@ -69,4 +69,6 @@ In addition to custom tests, you can also add use case and test-specific context
[^6]: [Can I use my own tests?](/developer/model-testing/testing-overview.qmd#can-i-use-my-own-tests)
-[^7]: [Add context to LLM-generated test descriptions](/notebooks/how_to/add_context_to_llm_descriptions.ipynb)
\ No newline at end of file
+[^7]: [Understand and utilize `RawData` in {{< var vm.product >}} tests](/notebooks/how_to/understand_utilize_rawdata.ipynb)
+
+[^8]: [Add context to LLM-generated test descriptions](/notebooks/how_to/add_context_to_llm_descriptions.ipynb)
\ No newline at end of file
diff --git a/site/notebooks.zip b/site/notebooks.zip
index 115a60ad76..06efc737df 100644
Binary files a/site/notebooks.zip and b/site/notebooks.zip differ
diff --git a/site/notebooks/how_to/understand_utilize_rawdata.ipynb b/site/notebooks/how_to/understand_utilize_rawdata.ipynb
new file mode 100644
index 0000000000..7354eae858
--- /dev/null
+++ b/site/notebooks/how_to/understand_utilize_rawdata.ipynb
@@ -0,0 +1,571 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "c18ba8a2",
+ "metadata": {},
+ "source": [
+ "# Understand and utilize `RawData` in ValidMind tests\n",
+ "\n",
+ "Test functions in ValidMind can return a special object called *`RawData`*, which holds intermediate or unprocessed data produced somewhere in the test logic but not returned as part of the test's visible output, such as in tables or figures.\n",
+ "\n",
+ "- The `RawData` feature allows you to customize the output of tests, making it a powerful tool for creating custom tests and post-processing functions.\n",
+ "- `RawData` is useful when running post-processing functions with tests to recompute tabular outputs, redraw figures, or even create new outputs entirely.\n",
+ "\n",
+ "In this notebook, you'll learn how to access, inspect, and utilize `RawData` from ValidMind tests."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b5b248c",
+ "metadata": {},
+ "source": [
+ "::: {.content-hidden when-format=\"html\"}\n",
+ "## Contents \n",
+ "- [Setup](#toc1_) \n",
+ " - [Installation and intialization](#toc1_1_) \n",
+ " - [Load the sample dataset](#toc1_2_) \n",
+ " - [Initialize the ValidMind objects](#toc1_3_) \n",
+ "- [`RawData` usage examples](#toc2_) \n",
+ " - [Using `RawData` from the ROC Curve Test](#toc2_1_) \n",
+ " - [Pearson Correlation Matrix](#toc2_2_) \n",
+ " - [Precision-Recall Curve](#toc2_3_) \n",
+ " - [Using `RawData` in custom tests](#toc2_4_) \n",
+ "\n",
+ ":::\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6dd79a98",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## Setup\n",
+ "\n",
+ "Before we can run our examples, we'll need to set the stage to enable running tests with the ValidMind Library. Since the focus of this notebook is on the `RawData` object, this section will merely summarize the steps instead of going into greater detail. \n",
+ "\n",
+ "\n",
+ "**To learn more about running tests with ValidMind:** [Run tests and test suites](https://docs.validmind.ai/developer/model-testing/testing-overview.html)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b6d8d15",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Installation and intialization\n",
+ "\n",
+ "First, let's make sure that the ValidMind Library is installed and ready to go, and our Python environment set up for data analysis:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "04eb084e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Install the ValidMind Library\n",
+ "%pip install -q validmind\n",
+ "\n",
+ "# Initialize the ValidMind Library\n",
+ "import validmind as vm\n",
+ "\n",
+ "# Import the `xgboost` library with an alias\n",
+ "import xgboost as xgb\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5e6aa2cb",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Load the sample dataset\n",
+ "\n",
+ "Then, we'll import a sample ValidMind dataset and preprocess it:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "50d72eba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the `customer_churn` sample dataset\n",
+ "from validmind.datasets.classification import customer_churn\n",
+ "raw_df = customer_churn.load_data()\n",
+ "\n",
+ "# Preprocess the raw dataset\n",
+ "train_df, validation_df, test_df = customer_churn.preprocess(raw_df)\n",
+ "\n",
+ "# Separate features and targets\n",
+ "x_train = train_df.drop(customer_churn.target_column, axis=1)\n",
+ "y_train = train_df[customer_churn.target_column]\n",
+ "x_val = validation_df.drop(customer_churn.target_column, axis=1)\n",
+ "y_val = validation_df[customer_churn.target_column]\n",
+ "\n",
+ "# Create an `XGBClassifier` object\n",
+ "model = xgb.XGBClassifier(early_stopping_rounds=10)\n",
+ "model.set_params(\n",
+ " eval_metric=[\"error\", \"logloss\", \"auc\"],\n",
+ ")\n",
+ "\n",
+ "# Train the model using the validation set\n",
+ "model.fit(\n",
+ " x_train,\n",
+ " y_train,\n",
+ " eval_set=[(x_val, y_val)],\n",
+ " verbose=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e3895d35",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Initialize the ValidMind objects"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0e441f4",
+ "metadata": {},
+ "source": [
+ "Before you can run tests, you'll need to initialize a ValidMind dataset object, as well as a ValidMind model object that can be passed to other functions for analysis and tests on the data:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b2310bc4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Initialize the dataset object\n",
+ "vm_raw_dataset = vm.init_dataset(\n",
+ " dataset=raw_df,\n",
+ " input_id=\"raw_dataset\",\n",
+ " target_column=customer_churn.target_column,\n",
+ " class_labels=customer_churn.class_labels,\n",
+ " __log=False,\n",
+ ")\n",
+ "\n",
+ "# Initialize the datasets into their own dataset objects\n",
+ "vm_train_ds = vm.init_dataset(\n",
+ " dataset=train_df,\n",
+ " input_id=\"train_dataset\",\n",
+ " target_column=customer_churn.target_column,\n",
+ " __log=False,\n",
+ ")\n",
+ "vm_test_ds = vm.init_dataset(\n",
+ " dataset=test_df,\n",
+ " input_id=\"test_dataset\",\n",
+ " target_column=customer_churn.target_column,\n",
+ " __log=False,\n",
+ ")\n",
+ "\n",
+ "# Initialize a model object\n",
+ "vm_model = vm.init_model(\n",
+ " model,\n",
+ " input_id=\"model\",\n",
+ " __log=False,\n",
+ ")\n",
+ "\n",
+ "# Assign predictions to the datasets\n",
+ "vm_train_ds.assign_predictions(\n",
+ " model=vm_model,\n",
+ ")\n",
+ "\n",
+ "vm_test_ds.assign_predictions(\n",
+ " model=vm_model,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "25ec99fc",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "## `RawData` usage examples\n",
+ "\n",
+ "Once you're set up to run tests, you can then try out the following examples:\n",
+ "\n",
+ " - [Using `RawData` from the ROC Curve Test](#toc2_1_) \n",
+ " - [Pearson Correlation Matrix](#toc2_2_) \n",
+ " - [Precision-Recall Curve](#toc2_3_) \n",
+ " - [Using `RawData` in custom tests](#toc2_4_) "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "33d79841",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Using `RawData` from the ROC Curve Test\n",
+ "\n",
+ "In this introductory example, we run the [ROC Curve](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html) test, inspect its `RawData` output, and then create a custom ROC curve using the raw data values.\n",
+ "\n",
+ "First, let's run the default ROC Curve test for comparsion with later iterations:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "58a3a779",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from validmind.tests import run_test\n",
+ "\n",
+ "# Run the ROC Curve test normally\n",
+ "result_roc = run_test(\n",
+ " \"validmind.model_validation.sklearn.ROCCurve\",\n",
+ " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n",
+ " generate_description=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "66c44fe0",
+ "metadata": {},
+ "source": [
+ "Now let's assume we want to create a custom version of the above figure. First, let's inspect the raw data that this test produces so we can see what we have to work with.\n",
+ "\n",
+ "`RawData` objects have a `inspect()` method that will pretty print the attributes of the object to be able to quickly see the data and its types:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "513ce01e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Inspect the RawData output from the ROC test\n",
+ "print(\"RawData from ROC Curve Test:\")\n",
+ "result_roc.raw_data.inspect()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "586f3a12",
+ "metadata": {},
+ "source": [
+ "As we can see, the ROC Curve returns a `RawData` object with the following attributes:\n",
+ "- **`fpr`:** A list of false positive rates\n",
+ "- **`tpr`:** A list of true positive rates\n",
+ "- **`auc`:** The area under the curve\n",
+ "\n",
+ "This should be enough to create our own custom ROC curve via a post-processing function without having to create a whole new test from scratch and without having to recompute any of the data:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "613778d2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "from validmind.vm_models.result import TestResult\n",
+ "\n",
+ "\n",
+ "def custom_roc_curve(result: TestResult):\n",
+ " # Extract raw data from the test result\n",
+ " fpr = result.raw_data.fpr\n",
+ " tpr = result.raw_data.tpr\n",
+ " auc = result.raw_data.auc\n",
+ "\n",
+ " # Create a custom ROC curve plot\n",
+ " fig = plt.figure()\n",
+ " plt.plot(fpr, tpr, label=f\"Custom ROC (AUC = {auc:.2f})\", color=\"blue\")\n",
+ " plt.plot([0, 1], [0, 1], linestyle=\"--\", color=\"gray\", label=\"Random Guess\")\n",
+ " plt.xlabel(\"False Positive Rate\")\n",
+ " plt.ylabel(\"True Positive Rate\")\n",
+ " plt.title(\"Custom ROC Curve from RawData\")\n",
+ " plt.legend()\n",
+ "\n",
+ " # close the plot to avoid it automatically being shown in the notebook\n",
+ " plt.close()\n",
+ "\n",
+ " # remove existing figure\n",
+ " result.remove_figure(0)\n",
+ "\n",
+ " # add new figure\n",
+ " result.add_figure(fig)\n",
+ "\n",
+ " return result\n",
+ "\n",
+ "# test it on the existing result\n",
+ "modified_result = custom_roc_curve(result_roc)\n",
+ "\n",
+ "# show the modified result\n",
+ "modified_result.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "794d026c",
+ "metadata": {},
+ "source": [
+ "Now that we have created a post-processing function and verified that it works on our existing test result, we can use it directly in `run_test()` from now on:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7c7566f3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "result = run_test(\n",
+ " \"validmind.model_validation.sklearn.ROCCurve\",\n",
+ " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n",
+ " post_process_fn=custom_roc_curve,\n",
+ " generate_description=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1d0b94aa",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Pearson Correlation Matrix\n",
+ "\n",
+ "In this next example, try commenting out the `post_process_fn` argument in the following cell and see what happens between different runs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c57fb01b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import plotly.graph_objects as go\n",
+ "\n",
+ "\n",
+ "def custom_heatmap(result: TestResult):\n",
+ " corr_matrix = result.raw_data.correlation_matrix\n",
+ "\n",
+ " heatmap = go.Heatmap(\n",
+ " z=corr_matrix.values,\n",
+ " x=list(corr_matrix.columns),\n",
+ " y=list(corr_matrix.index),\n",
+ " colorscale=\"Viridis\",\n",
+ " )\n",
+ " fig = go.Figure(data=[heatmap])\n",
+ " fig.update_layout(title=\"Custom Heatmap from RawData\")\n",
+ "\n",
+ " plt.close()\n",
+ "\n",
+ " result.remove_figure(0)\n",
+ " result.add_figure(fig)\n",
+ "\n",
+ " return result\n",
+ "\n",
+ "\n",
+ "result_corr = run_test(\n",
+ " \"validmind.data_validation.PearsonCorrelationMatrix\",\n",
+ " inputs={\"dataset\": vm_test_ds},\n",
+ " generate_description=False,\n",
+ " # COMMENT OUT `post_process_fn`\n",
+ " post_process_fn=custom_heatmap,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0a7cbbc6",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Precision-Recall Curve\n",
+ "\n",
+ "Then, let's try the same thing with the [Precision-Recall Curve](https://docs.validmind.ai/tests/model_validation/sklearn/PrecisionRecallCurve.html) test:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d16c5209",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def custom_pr_curve(result: TestResult):\n",
+ " precision = result.raw_data.precision\n",
+ " recall = result.raw_data.recall\n",
+ "\n",
+ " fig = plt.figure()\n",
+ " plt.plot(recall, precision, label=\"Precision-Recall Curve\")\n",
+ " plt.xlabel(\"Recall\")\n",
+ " plt.ylabel(\"Precision\")\n",
+ " plt.title(\"Custom Precision-Recall Curve from RawData\")\n",
+ " plt.legend()\n",
+ "\n",
+ " plt.close()\n",
+ " result.remove_figure(0)\n",
+ " result.add_figure(fig)\n",
+ "\n",
+ " return result\n",
+ "\n",
+ "result_pr = run_test(\n",
+ " \"validmind.model_validation.sklearn.PrecisionRecallCurve\",\n",
+ " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n",
+ " generate_description=False,\n",
+ " # COMMENT OUT `post_process_fn`\n",
+ " post_process_fn=custom_pr_curve,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e25391a4",
+ "metadata": {},
+ "source": [
+ "\n",
+ "\n",
+ "### Using `RawData` in custom tests\n",
+ "\n",
+ "These examples demonstrate some very simple ways to use the `RawData` feature of ValidMind tests. The majority of ValidMind-developed tests return some form of raw data that can be used to customize the output of the test, but you can also create your own tests that return `RawData` objects and use them in the same way.\n",
+ "\n",
+ "Let's take a look at how this can be done in custom tests. To start, define and run your custom test:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dc6a389f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "from validmind import test, RawData\n",
+ "from validmind.vm_models import VMDataset, VMModel\n",
+ "\n",
+ "\n",
+ "@test(\"custom.MyCustomTest\")\n",
+ "def MyCustomTest(dataset: VMDataset, model: VMModel) -> tuple[go.Figure, RawData]:\n",
+ " \"\"\"Custom test that produces a figure and a RawData object\"\"\"\n",
+ " # pretend we are using the dataset and model to compute some data\n",
+ " # ...\n",
+ "\n",
+ " # create some fake data that will be used to generate a figure\n",
+ " data = pd.DataFrame({\"x\": [10, 20, 30, 40, 50], \"y\": [10, 20, 30, 40, 50]})\n",
+ "\n",
+ " # create the figure (scatter plot)\n",
+ " fig = go.Figure(data=go.Scatter(x=data[\"x\"], y=data[\"y\"]))\n",
+ "\n",
+ " # now let's create a RawData object that holds the \"computed\" data\n",
+ " raw_data = RawData(scatter_data_df=data)\n",
+ "\n",
+ " # finally, return both the figure and the raw data\n",
+ " return fig, raw_data\n",
+ "\n",
+ "\n",
+ "my_result = run_test(\n",
+ " \"custom.MyCustomTest\",\n",
+ " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n",
+ " generate_description=False,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "854c219c",
+ "metadata": {},
+ "source": [
+ "We can see that the test result shows the figure. But since we returned a `RawData` object, we can also inspect the contents and see how we could use it to customize or regenerate the figure in the post-processing function:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1cb661d1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "my_result.raw_data.inspect()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "55ad4acd",
+ "metadata": {},
+ "source": [
+ "We can see that we get a nicely-formatted preview of the dataframe we stored in the raw data object. Let's go ahead and use it to re-plot our data:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c1242083",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def custom_plot(result: TestResult):\n",
+ " data = result.raw_data.scatter_data_df\n",
+ "\n",
+ " # use something other than a scatter plot\n",
+ " fig = go.Figure(data=go.Bar(x=data[\"x\"], y=data[\"y\"]))\n",
+ " fig.update_layout(title=\"Custom Bar Chart from RawData\")\n",
+ " fig.update_xaxes(title=\"X Axis\")\n",
+ " fig.update_yaxes(title=\"Y Axis\")\n",
+ "\n",
+ " result.remove_figure(0)\n",
+ " result.add_figure(fig)\n",
+ "\n",
+ " return result\n",
+ "\n",
+ "result = run_test(\n",
+ " \"custom.MyCustomTest\",\n",
+ " inputs={\"dataset\": vm_test_ds, \"model\": vm_model},\n",
+ " post_process_fn=custom_plot,\n",
+ " generate_description=False,\n",
+ ")"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/site/python-docs.zip b/site/python-docs.zip
index 127abb3922..67d109e059 100644
Binary files a/site/python-docs.zip and b/site/python-docs.zip differ