Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 3 additions & 9 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,6 @@ jobs:
- name: Remove Build Environment
run: rm -rf .venv

- name: Setup Virtual Environment
run: python -m venv sdist-venv

- name: Install Built Package
run: sdist-venv/bin/pip install --no-cache-dir "$(ls dist/validmind*.whl | head -n 1)[llm,huggingface]"

- name: 'Setup Virtual Environment for [all]'
run: python -m venv all-venv

Expand All @@ -64,13 +58,13 @@ jobs:
run: all-venv/bin/pip install --no-cache-dir "$(ls dist/validmind*.whl | head -n 1)[all]"

- name: Install Additional Dependencies
run: sdist-venv/bin/pip install nbformat papermill jupyter
run: all-venv/bin/pip install nbformat papermill jupyter

- name: Create Jupyter Kernel
run: sdist-venv/bin/python -m ipykernel install --user --name sdist-venv
run: all-venv/bin/python -m ipykernel install --user --name all-venv

- name: Integration Tests
run: sdist-venv/bin/python scripts/run_e2e_notebooks.py --kernel sdist-venv
run: all-venv/bin/python scripts/run_e2e_notebooks.py --kernel all-venv
env:
NOTEBOOK_RUNNER_DEFAULT_MODEL: ${{ secrets.NOTEBOOK_RUNNER_DEFAULT_PROJECT_ID }}
NOTEBOOK_RUNNER_API_KEY: ${{ secrets.NOTEBOOK_RUNNER_API_KEY }}
Expand Down
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,32 @@ pip install validmind
pip install rpy2
```

## PII Detection

The ValidMind Library includes optional PII detection capabilities using Microsoft Presidio to automatically detect sensitive data in test results and prevent accidental logging.

**Installation:**

```bash
pip install validmind[pii-detection]
```

**Configure PII detection:**

```bash
# Enable PII detection for test results only
export VALIDMIND_PII_DETECTION=test_results

# Enable PII detection for test descriptions only
export VALIDMIND_PII_DETECTION=test_descriptions

# Enable PII detection for both test results and descriptions
export VALIDMIND_PII_DETECTION=all

# Disable PII detection (default)
export VALIDMIND_PII_DETECTION=disabled
```

## How to contribute

### Install dependencies
Expand Down
189 changes: 189 additions & 0 deletions notebooks/how_to/configure_pii_detection.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# PII Detection Modes with a Custom Test\n",
"\n",
"This notebook shows how to initialize ValidMind, implement a custom test that emits PII, and observe behavior differences under each `VALIDMIND_PII_DETECTION` mode when running the test with `validmind.tests.run_test`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"- `validmind` installed with PII extras:\n",
"\n",
"```bash\n",
"%pip install -q validmind[pii-detection]\n",
"```\n",
"\n",
"- A ValidMind model registered. We'll initialize the library using your model snippet.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q \"validmind[pii-detection]\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize ValidMind\n",
"\n",
"Initialize using your model code snippet or a `.env` file, as shown in other quickstarts.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load your model identifier credentials from an `.env` file\n",
"%load_ext dotenv\n",
"%dotenv .env\n",
"\n",
"# Or initialize with your code snippet\n",
"import validmind as vm\n",
"\n",
"vm.init(\n",
" # api_host=\"...\",\n",
" # api_key=\"...\",\n",
" # api_secret=\"...\",\n",
" # model=\"...\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a custom test that emits PII\n",
"\n",
"We'll create a custom test that returns:\n",
"- A description string containing PII (name, email, phone)\n",
"- A small table containing PII in columns\n",
"\n",
"This mirrors the structure used in other custom test notebooks and will exercise both table and description PII detection paths.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"from validmind import test\n",
"\n",
"@test(\"my_pii_demo.PIIEmittingTest\")\n",
"def pii_emitting_test():\n",
" \"\"\"A demo test that returns PII\"\"\"\n",
" return pd.DataFrame(\n",
" {\n",
" \"name\": [\"Jane Smith\", \"John Doe\", \"Alice Johnson\"],\n",
" \"email\": [\n",
" \"jane.smith@bank.example\",\n",
" \"john.doe@company.example\",\n",
" \"alice.johnson@service.example\",\n",
" ],\n",
" \"phone\": [\"(212) 555-9876\", \"(415) 555-1234\", \"(646) 555-5678\"],\n",
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the test under different PII detection modes\n",
"\n",
"We'll switch `VALIDMIND_PII_DETECTION` across modes and run the same test with `validmind.tests.run_test`. We catch exceptions to observe blocking behavior.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from validmind.tests import run_test\n",
"\n",
"MODES = [\"disabled\", \"test_results\", \"test_descriptions\", \"all\"]\n",
"\n",
"for mode in MODES:\n",
" print(\"\\n=== Mode:\", mode, \"===\")\n",
" os.environ[\"VALIDMIND_PII_DETECTION\"] = mode\n",
" try:\n",
" result = run_test(\"my_pii_demo.PIIEmittingTest\")\n",
"\n",
" # check if the description was generated\n",
" if not result._was_description_generated:\n",
" print(\"Blocked: Test Description Generation was not run due to PII\")\n",
" else:\n",
" print(\"Description was generated by LLM\")\n",
"\n",
" # Try logging (this triggers PII checks before upload)\n",
" result.log()\n",
" print(\"Logging to API succeeded\")\n",
" except Exception as e:\n",
" print(\"Blocked: Test Result was not logged due to PII\")\n",
" # print(e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Expected behavior by mode\n",
"\n",
"- disabled: No PII checks.\n",
"- test_results: Description is generated but result is not logged.\n",
"- test_descriptions: Description generation is blocked but result is logged.\n",
"- all: Description generation and logging are both blocked.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notes\n",
"\n",
"- If you see warnings that Presidio is unavailable, ensure you installed extras: `validmind[pii-detection]`.\n",
"- You can override blocking by passing `unsafe=True` to `result.log(unsafe=True)`, but this is not recommended outside controlled workflows.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading