feat: implement PHIX validation for schools and daycares#152
feat: implement PHIX validation for schools and daycares#152eswarchandravidyasagar wants to merge 4 commits intomainfrom
Conversation
eswarchandravidyasagar
commented
Jan 14, 2026
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
- Integrated validation into the preprocessing step in orchestrator.py.
- Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
- Created unit tests for the validation module covering various scenarios.
- Added documentation for the validation plan and updated the plans directory.
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list. - Integrated validation into the preprocessing step in orchestrator.py. - Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities. - Created unit tests for the validation module covering various scenarios. - Added documentation for the validation plan and updated the plans directory.
|
We don't have redistribution permission on the phix reference list file, so that will need to be removed and commits squashed. It'll also blow up the size of this repository and its history. Users will have to BYO phix reference list |
config/parameters.yaml
Outdated
| # Path to PHIX reference Excel file (relative to project root) | ||
| reference_file: PHIX Reference Lists v5.2 - 2025Jun30.xlsx | ||
| # Minimum fuzzy match score (0-100) to consider a match | ||
| match_threshold: 85 |
There was a problem hiding this comment.
Is this required. It should be exact? This could enable bypass of the exact issues we'd like to protect against like similarly named schools being accidentally selected when a panorama user creates a forecast query
|
We likely need a mapping file that converts the PHU name from phix reference document, to standardized PHU acronyms (which should be enforced for template folders, etc) We also may need to allow functionality for this map to be many-to-one, in the case of PHUs which have merged since this was last updated. |
|
I know in this case that this is important to run early in pipeline before other processing, but I wonder also if we can emit something in the per-pdf validation log regarding valid facility being used for the target PHU? |
- Updated `validate_phix.py` to remove fuzzy matching and implement strict exact matching for facility names against the PHIX reference list. - Introduced PHU alias mapping to restrict validation to specific Public Health Units (PHUs) using a YAML configuration file. - Enhanced the `validate_facilities` function to support PHU scoping and improved error handling for unmatched facilities. - Updated tests to reflect changes in matching strategy and added new tests for PHU alias mapping and validation behavior. - Modified documentation to clarify the new validation process and configuration options.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… column prefix, support multiple facility columns