SC-2886: PII Detection for Uploaded Test Results#410
Conversation
- Bump Poetry version to 2.1.3 in `poetry.lock`. - Introduce optional PII detection capabilities using Microsoft Presidio in `README.md`. - Add `pii_filter.py` for detecting and masking PII in test results. - Modify `check_for_sensitive_data` to utilize the new PII detection functionality. - Update `pyproject.toml` to include `presidio-analyzer` as an optional dependency. - Adjust `TestResult` class to ensure PII checks are performed correctly.
|
Looks good to me. Consider allowing the user to PII scan data before it goes to the LLM. Possible to set the env var to a tuple: |
…developer-framework
- Update README.md to reflect new PII filtering options, replacing "Enable PII detection" with "Configure PII filtering" and detailing available modes. - Modify run_e2e_notebooks.py to accept a new command-line option for PII filtering mode, allowing users to specify their desired filtering behavior during notebook execution. - Implement PII filtering in test descriptions by adding a new function to filter PII from summaries before sending to the LLM. - Introduce an Enum for PII filtering modes in pii_filter.py, improving clarity and maintainability of PII filtering logic. - Update existing functions to utilize the new PII filtering capabilities, ensuring that PII is appropriately handled in test results and descriptions.
…nctionality - Update README.md to change "PII filtering" to "PII detection" for clarity and consistency. - Modify run_e2e_notebooks.py to reflect the new environment variable for PII detection. - Refactor test descriptions to check for PII content instead of filtering it, raising exceptions when PII is detected. - Rename PII filtering-related functions and enums in pii_filter.py to align with the new terminology. - Ensure all references to PII handling are updated to use the new detection logic.
|
Just watching your demo during sprint kickoff — I don't see the demo code you're showing off in this PR, perhaps it needs to be added as a mini notebook to |
…port - Introduce Presidio Structured for improved PII detection in structured data. - Update `check_table_for_pii` and related functions to utilize structured analysis when available. - Implement lazy loading for Presidio Structured components to ensure compatibility. - Modify `generate_description` and `TestResult` classes to include PII checks for tables and descriptions. - Update dependencies in `pyproject.toml` and `poetry.lock` to include `presidio-structured`. - Enhance error handling and logging for PII detection failures.
…developer-framework
Sorry just saw this @validbeck ... i created a notebook and its available in this PR: notebooks/how_to/configure_pii_detection.ipynb |
PR SummaryThis PR introduces significant enhancements centered around two main areas:
Overall, this PR not only streamlines the integration workflow but also enhances the software's capability to attempt automatic PII filtering, thereby helping to secure potentially sensitive output data. Test Suggestions
|
Pull Request Description
What and why?
Using Microsoft Presidio to add PII detection to the library that is off by default but can be turned on with an environment variable. This will check every test result for PII before its uploaded and throw an error if its detected. The
unsafe=Trueflag can be used to override for a specific test result.How to test
What needs special review?
Dependencies, breaking changes, and deployment notes
Release notes
Checklist