Skip to content
Merged
164 changes: 159 additions & 5 deletions notebooks/how_to/understand_utilize_rawdata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
" - [Pearson Correlation Matrix](#toc2_2_) \n",
" - [Precision-Recall Curve](#toc2_3_) \n",
" - [Using `RawData` in custom tests](#toc2_4_) \n",
"\n",
" - [Using `RawData` in comparison tests](#toc2_5_) \n",
":::\n",
"<!-- jn-toc-notebook-config\n",
"\tnumbering=false\n",
Expand Down Expand Up @@ -213,7 +213,8 @@
" - [Using `RawData` from the ROC Curve Test](#toc2_1_) \n",
" - [Pearson Correlation Matrix](#toc2_2_) \n",
" - [Precision-Recall Curve](#toc2_3_) \n",
" - [Using `RawData` in custom tests](#toc2_4_) "
" - [Using `RawData` in custom tests](#toc2_4_) \n",
" - [Using `RawData` in comparison tests](#toc2_5_) "
]
},
{
Expand Down Expand Up @@ -553,17 +554,170 @@
" generate_description=False,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "53084493",
"metadata": {},
"source": [
"<a id='toc2_5_'></a>\n",
"\n",
"### Using `RawData` in comparison tests\n",
"\n",
"When running comparison tests, the `RawData` object will contain the raw data for each individual test result as well as the comparison results between the test results. To support this, the RawData object contains the model and dataset input_ids for each of the datasets and models in the test, so that the post-processing function can use them to customize the output. The example below shows how to use the `RawData` object to customize the output of a comparison test and add a table to the test result that shows the confusion matrix for each individual test result as well as the comparison results between the test results.\n",
"\n",
"When designing post-processing functions that need to handle both individual and comparison test results, you can check the structure of the raw data to determine which case you're dealing with. In the example below, we check if `confusion_matrix` is a list (comparison test with multiple matrices) or a single matrix (individual test). For comparison tests, the function creates two tables: one showing the confusion matrices for each test case, and another showing the percentage drift between them. For individual tests, it creates a single table with the confusion matrix values. This pattern of checking the raw data structure can be applied to other tests to create versatile post-processing functions that work in both scenarios.\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "bcbbe9f4",
"metadata": {},
"outputs": [],
"source": [
"def cm_table(result: TestResult):\n",
" # For individual results\n",
" if not isinstance(result.raw_data.confusion_matrix, list):\n",
" # Extract values from single confusion matrix\n",
" cm = result.raw_data.confusion_matrix\n",
" tn, fp = cm[0, 0], cm[0, 1]\n",
" fn, tp = cm[1, 0], cm[1, 1]\n",
" \n",
" # Create DataFrame for individual matrix\n",
" cm_df = pd.DataFrame({\n",
" 'TN': [tn],\n",
" 'FP': [fp],\n",
" 'FN': [fn],\n",
" 'TP': [tp]\n",
" })\n",
" \n",
" # Add individual table\n",
" result.add_table(cm_df, title=\"Confusion Matrix\")\n",
" \n",
" # For comparison results\n",
" else:\n",
" cms = result.raw_data.confusion_matrix\n",
" cm1, cm2 = cms[0], cms[1]\n",
" \n",
" # Create individual results table\n",
" rows = []\n",
" for i, cm in enumerate(cms):\n",
" rows.append({\n",
" 'dataset': result.raw_data.dataset[i],\n",
" 'model': result.raw_data.model[i],\n",
" 'TN': cm[0, 0],\n",
" 'FP': cm[0, 1],\n",
" 'FN': cm[1, 0],\n",
" 'TP': cm[1, 1]\n",
" })\n",
" individual_df = pd.DataFrame(rows)\n",
" \n",
" # Calculate percentage differences\n",
" diff_df = pd.DataFrame({\n",
" 'TN_drift (%)': [(cm2[0, 0] - cm1[0, 0]) / cm1[0, 0] * 100],\n",
" 'FP_drift (%)': [(cm2[0, 1] - cm1[0, 1]) / cm1[0, 1] * 100],\n",
" 'FN_drift (%)': [(cm2[1, 0] - cm1[1, 0]) / cm1[1, 0] * 100],\n",
" 'TP_drift (%)': [(cm2[1, 1] - cm1[1, 1]) / cm1[1, 1] * 100]\n",
" }).round(2)\n",
" \n",
" # Add both tables\n",
" result.add_table(individual_df, title=\"Individual Confusion Matrices\")\n",
" result.add_table(diff_df, title=\"Confusion Matrix Drift\")\n",
" \n",
" return result"
]
},
{
"cell_type": "markdown",
"id": "41edd959",
"metadata": {},
"source": [
"Let's first run the confusion matrix test on a single dataset-model pair to see how our post-processing function handles individual results:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cf3c47fe",
"metadata": {},
"outputs": [],
"source": [
"from validmind.tests import run_test\n",
"\n",
"result_cm = run_test(\n",
" \"validmind.model_validation.sklearn.ConfusionMatrix\",\n",
" inputs={\n",
" \"dataset\": vm_test_ds,\n",
" \"model\": vm_model,\n",
" },\n",
" post_process_fn=cm_table,\n",
" generate_description=False,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a2482c54",
"metadata": {},
"source": [
"Now let's run a comparison test between test and train datasets to see how the function handles multiple results:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a1b4388",
"metadata": {},
"outputs": [],
"source": [
"result_cm = run_test(\n",
" \"validmind.model_validation.sklearn.ConfusionMatrix\",\n",
" input_grid={\n",
" \"dataset\": [vm_test_ds, vm_train_ds],\n",
" \"model\": [vm_model]\n",
" },\n",
" post_process_fn=cm_table,\n",
" generate_description=False,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "9f7d361a",
"metadata": {},
"source": [
"Let's inspect the raw data to see how comparison tests structure their data - notice how the `RawData` object contains not just the confusion matrices for both datasets, but also tracks which dataset and model each result came from:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "012ec495",
"metadata": {},
"outputs": [],
"source": [
"result_cm.raw_data.inspect()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "ValidMind Library",
"language": "python",
"name": "python3"
"name": "validmind"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"version": "3.10.13"
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ description = "ValidMind Library"
license = "Commercial License"
name = "validmind"
readme = "README.pypi.md"
version = "2.8.11"
version = "2.8.12"

[tool.poetry.dependencies]
aiohttp = {extras = ["speedups"], version = "*"}
Expand Down
9 changes: 7 additions & 2 deletions scripts/bulk_ai_test_updates.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def list_to_str(lst):

**Purpose**:

The Feature Drift test aims to evaluate how much the distribution of features has shifted over time between two datasets, typically training and monitoring datasets. It uses the Population Stability Index (PSI) to quantify this change, providing insights into the models robustness and the necessity for retraining or feature engineering.
The Feature Drift test aims to evaluate how much the distribution of features has shifted over time between two datasets, typically training and monitoring datasets. It uses the Population Stability Index (PSI) to quantify this change, providing insights into the model's robustness and the necessity for retraining or feature engineering.

**Test Mechanism**:

Expand Down Expand Up @@ -181,6 +181,11 @@ def list_to_str(lst):
Its a class that can be initialized with any number of any type of objects using a key-value like interface where the key in the constructor is the name of the object and the value is the object itself.
It should only be used to store data that is not already returned as part of the test result (i.e. in a table) but could be useful to re-generate any of the test result objects (tables, figures).

When adding raw data, you should always include:
- If the test has access to a model parameter (VMModel), include its input_id as model=model.input_id
- If the test has access to a dataset parameter (VMDataset), include its input_id as dataset=dataset.input_id
Only include these if they are available in the test function parameters - don't force both if only one is accessible.

You will be provided with the source code for a "test" that is run against an ML model or dataset.
You will analyze the code to determine the details and implementation of the test.
Then you will use the below example to implement changes to the test to make it use the new raw data mechanism offered by the ValidMind SDK.
Expand Down Expand Up @@ -228,7 +233,7 @@ def ExampleConfusionMatrix(model: VMModel, dataset: VMDataset):
fig = ff.create_annotated_heatmap()
..

return fig, RawData(confusion_matrix=cm)
return fig, RawData(confusion_matrix=cm, model=model.input_id, dataset=dataset.input_id)
```

Notice that the test now returns a tuple of the figure and the raw data.
Expand Down
7 changes: 4 additions & 3 deletions tests/unit_tests/data_validation/test_IQROutliersTable.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,11 @@ def setUp(self):
)

def test_outliers_structure(self):
result = IQROutliersTable(self.vm_dataset)
result, raw_data = IQROutliersTable(self.vm_dataset)

# Check basic structure
self.assertIsInstance(result, dict)
self.assertIsInstance(raw_data, vm.RawData)
self.assertIn("Summary of Outliers Detected by IQR Method", result)

# Check result structure
Expand All @@ -59,7 +60,7 @@ def test_outliers_structure(self):
self.assertIn("Maximum Outlier Value", summary)

def test_outliers_detection(self):
result = IQROutliersTable(self.vm_dataset)
result, raw_data = IQROutliersTable(self.vm_dataset)
outliers_summary = result["Summary of Outliers Detected by IQR Method"]

# Check that outliers are detected in the 'with_outliers' column
Expand All @@ -76,7 +77,7 @@ def test_outliers_detection(self):
self.assertIsNone(normal_summary)

def test_binary_exclusion(self):
result = IQROutliersTable(self.vm_dataset)
result, raw_data = IQROutliersTable(self.vm_dataset)
outliers_summary = result["Summary of Outliers Detected by IQR Method"]

# Verify binary column is not in results
Expand Down
30 changes: 12 additions & 18 deletions tests/unit_tests/data_validation/test_IsolationForestOutliers.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,25 +28,19 @@ def setUp(self):
)

def test_outliers_detection(self):
result = IsolationForestOutliers(self.vm_dataset, contamination=0.1)
figure, raw_data = IsolationForestOutliers(self.vm_dataset, contamination=0.1)

# Check return type
self.assertIsInstance(result, tuple)
# Check return types
self.assertIsInstance(figure, plt.Figure)
self.assertIsInstance(raw_data, vm.RawData)

# Separate figures and raw data
figures = result

# Check that at least one figure is returned
self.assertGreater(len(figures), 0)

# Check each figure
for fig in figures:
self.assertIsInstance(fig, plt.Figure)
# Check that the figure has at least one axes
self.assertGreater(len(figure.axes), 0)

def test_feature_columns_validation(self):
# Test with valid feature columns
try:
IsolationForestOutliers(
figure, raw_data = IsolationForestOutliers(
self.vm_dataset, feature_columns=["feature1", "feature2"]
)
except ValueError:
Expand All @@ -60,13 +54,13 @@ def test_feature_columns_validation(self):

def test_contamination_parameter(self):
# Test with different contamination levels
figures_low_contamination = IsolationForestOutliers(
figure_low, raw_data_low = IsolationForestOutliers(
self.vm_dataset, contamination=0.05
)
figures_high_contamination = IsolationForestOutliers(
figure_high, raw_data_high = IsolationForestOutliers(
self.vm_dataset, contamination=0.2
)

# Check that figures are returned for both contamination levels
self.assertGreater(len(figures_low_contamination), 0)
self.assertGreater(len(figures_high_contamination), 0)
# Check that figures have at least one axes
self.assertGreater(len(figure_low.axes), 0)
self.assertGreater(len(figure_high.axes), 0)
5 changes: 4 additions & 1 deletion tests/unit_tests/data_validation/test_JarqueBera.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,14 @@ def test_returns_dataframe_and_rawdata(self):
)

# Run the function
result = JarqueBera(vm_dataset)
result, raw_data = JarqueBera(vm_dataset)

# Check if result is a DataFrame
self.assertIsInstance(result, pd.DataFrame)

# Check if raw_data is a RawData object
self.assertIsInstance(raw_data, vm.RawData)

# Check if the DataFrame has the expected columns
expected_columns = ["column", "stat", "pvalue", "skew", "kurtosis"]
self.assertListEqual(list(result.columns), expected_columns)
Expand Down
5 changes: 4 additions & 1 deletion tests/unit_tests/data_validation/test_LJungBox.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,14 @@ def test_returns_dataframe_with_expected_shape(self):
)

# Run the function
result = LJungBox(vm_dataset)
result, raw_data = LJungBox(vm_dataset)

# Check if result is a DataFrame
self.assertIsInstance(result, pd.DataFrame)

# Check if raw_data is a RawData object
self.assertIsInstance(raw_data, vm.RawData)

# Check if the DataFrame has the expected columns
expected_columns = ["column", "stat", "pvalue"]
self.assertListEqual(list(result.columns), expected_columns)
Expand Down
8 changes: 5 additions & 3 deletions tests/unit_tests/data_validation/test_MissingValues.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,13 @@ def setUp(self):
)

def test_missing_values_structure(self):
summary, passed = MissingValues(self.vm_dataset)
# Run the function
summary, passed, raw_data = MissingValues(self.vm_dataset)

# Check return types
self.assertIsInstance(summary, list)
self.assertIsInstance(passed, bool)
self.assertIsInstance(raw_data, vm.RawData)

# Check summary structure
for column_summary in summary:
Expand All @@ -42,7 +44,7 @@ def test_missing_values_structure(self):
self.assertIn("Pass/Fail", column_summary)

def test_missing_values_counts(self):
summary, passed = MissingValues(self.vm_dataset)
summary, passed, raw_data = MissingValues(self.vm_dataset)

# Get results for each column
no_missing = next(s for s in summary if s["Column"] == "no_missing")
Expand All @@ -69,7 +71,7 @@ def test_missing_values_counts(self):

def test_threshold_parameter(self):
# Test with higher threshold that allows some missing values
summary, passed = MissingValues(self.vm_dataset, min_threshold=25)
summary, passed, raw_data = MissingValues(self.vm_dataset, min_threshold=25)

# Get results
some_missing = next(s for s in summary if s["Column"] == "some_missing")
Expand Down
Loading