validmind · johnwalz97 · Aug 21, 2025 · Aug 6, 2025 · Aug 6, 2025 · Aug 7, 2025
diff --git a/.github/workflows/integration.yaml b/.github/workflows/integration.yaml
@@ -50,12 +50,6 @@ jobs:
       - name: Remove Build Environment
         run: rm -rf .venv
 
-      - name: Setup Virtual Environment
-        run: python -m venv sdist-venv
-
-      - name: Install Built Package
-        run: sdist-venv/bin/pip install --no-cache-dir "$(ls dist/validmind*.whl | head -n 1)[llm,huggingface]"
-
       - name: 'Setup Virtual Environment for [all]'
         run: python -m venv all-venv
 
@@ -64,13 +58,13 @@ jobs:
         run: all-venv/bin/pip install --no-cache-dir "$(ls dist/validmind*.whl | head -n 1)[all]"
 
       - name: Install Additional Dependencies
-        run: sdist-venv/bin/pip install nbformat papermill jupyter
+        run: all-venv/bin/pip install nbformat papermill jupyter
 
       - name: Create Jupyter Kernel
-        run: sdist-venv/bin/python -m ipykernel install --user --name sdist-venv
+        run: all-venv/bin/python -m ipykernel install --user --name all-venv
 
       - name: Integration Tests
-        run: sdist-venv/bin/python scripts/run_e2e_notebooks.py --kernel sdist-venv
+        run: all-venv/bin/python scripts/run_e2e_notebooks.py --kernel all-venv
         env:
           NOTEBOOK_RUNNER_DEFAULT_MODEL: ${{ secrets.NOTEBOOK_RUNNER_DEFAULT_PROJECT_ID }}
           NOTEBOOK_RUNNER_API_KEY: ${{ secrets.NOTEBOOK_RUNNER_API_KEY }}

diff --git a/README.md b/README.md
@@ -63,6 +63,32 @@ pip install validmind
     pip install rpy2
     ```
 
+## PII Detection
+
+The ValidMind Library includes optional PII detection capabilities using Microsoft Presidio to automatically detect sensitive data in test results and prevent accidental logging.
+
+**Installation:**
+
+```bash
+pip install validmind[pii-detection]
+```
+
+**Configure PII detection:**
+
+```bash
+# Enable PII detection for test results only
+export VALIDMIND_PII_DETECTION=test_results
+
+# Enable PII detection for test descriptions only
+export VALIDMIND_PII_DETECTION=test_descriptions
+
+# Enable PII detection for both test results and descriptions
+export VALIDMIND_PII_DETECTION=all
+
+# Disable PII detection (default)
+export VALIDMIND_PII_DETECTION=disabled
+```
+
 ## How to contribute
 
 ### Install dependencies

diff --git a/notebooks/how_to/configure_pii_detection.ipynb b/notebooks/how_to/configure_pii_detection.ipynb
@@ -0,0 +1,189 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# PII Detection Modes with a Custom Test\n",
+        "\n",
+        "This notebook shows how to initialize ValidMind, implement a custom test that emits PII, and observe behavior differences under each `VALIDMIND_PII_DETECTION` mode when running the test with `validmind.tests.run_test`.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Prerequisites\n",
+        "\n",
+        "- `validmind` installed with PII extras:\n",
+        "\n",
+        "```bash\n",
+        "%pip install -q validmind[pii-detection]\n",
+        "```\n",
+        "\n",
+        "- A ValidMind model registered. We'll initialize the library using your model snippet.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "%pip install -q \"validmind[pii-detection]\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Initialize ValidMind\n",
+        "\n",
+        "Initialize using your model code snippet or a `.env` file, as shown in other quickstarts.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Load your model identifier credentials from an `.env` file\n",
+        "%load_ext dotenv\n",
+        "%dotenv .env\n",
+        "\n",
+        "# Or initialize with your code snippet\n",
+        "import validmind as vm\n",
+        "\n",
+        "vm.init(\n",
+        "    # api_host=\"...\",\n",
+        "    # api_key=\"...\",\n",
+        "    # api_secret=\"...\",\n",
+        "    # model=\"...\",\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Create a custom test that emits PII\n",
+        "\n",
+        "We'll create a custom test that returns:\n",
+        "- A description string containing PII (name, email, phone)\n",
+        "- A small table containing PII in columns\n",
+        "\n",
+        "This mirrors the structure used in other custom test notebooks and will exercise both table and description PII detection paths.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "\n",
+        "from validmind import test\n",
+        "\n",
+        "@test(\"my_pii_demo.PIIEmittingTest\")\n",
+        "def pii_emitting_test():\n",
+        "    \"\"\"A demo test that returns PII\"\"\"\n",
+        "    return pd.DataFrame(\n",
+        "        {\n",
+        "            \"name\": [\"Jane Smith\", \"John Doe\", \"Alice Johnson\"],\n",
+        "            \"email\": [\n",
+        "                \"jane.smith@bank.example\",\n",
+        "                \"john.doe@company.example\",\n",
+        "                \"alice.johnson@service.example\",\n",
+        "            ],\n",
+        "            \"phone\": [\"(212) 555-9876\", \"(415) 555-1234\", \"(646) 555-5678\"],\n",
+        "        }\n",
+        "    )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Run the test under different PII detection modes\n",
+        "\n",
+        "We'll switch `VALIDMIND_PII_DETECTION` across modes and run the same test with `validmind.tests.run_test`. We catch exceptions to observe blocking behavior.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "from validmind.tests import run_test\n",
+        "\n",
+        "MODES = [\"disabled\", \"test_results\", \"test_descriptions\", \"all\"]\n",
+        "\n",
+        "for mode in MODES:\n",
+        "    print(\"\\n=== Mode:\", mode, \"===\")\n",
+        "    os.environ[\"VALIDMIND_PII_DETECTION\"] = mode\n",
+        "    try:\n",
+        "        result = run_test(\"my_pii_demo.PIIEmittingTest\")\n",
+        "\n",
+        "        # check if the description was generated\n",
+        "        if not result._was_description_generated:\n",
+        "            print(\"Blocked: Test Description Generation was not run due to PII\")\n",
+        "        else:\n",
+        "            print(\"Description was generated by LLM\")\n",
+        "\n",
+        "        # Try logging (this triggers PII checks before upload)\n",
+        "        result.log()\n",
+        "        print(\"Logging to API succeeded\")\n",
+        "    except Exception as e:\n",
+        "        print(\"Blocked: Test Result was not logged due to PII\")\n",
+        "        # print(e)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Expected behavior by mode\n",
+        "\n",
+        "- disabled: No PII checks.\n",
+        "- test_results: Description is generated but result is not logged.\n",
+        "- test_descriptions: Description generation is blocked but result is logged.\n",
+        "- all: Description generation and logging are both blocked.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Notes\n",
+        "\n",
+        "- If you see warnings that Presidio is unavailable, ensure you installed extras: `validmind[pii-detection]`.\n",
+        "- You can override blocking by passing `unsafe=True` to `result.log(unsafe=True)`, but this is not recommended outside controlled workflows.\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": ".venv",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.12.11"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 2
+}