Pipeline for panelizing Federal Student Aid Title IV volume reports with exact opeid8 + award_year keys.
- Grants:
AY 1999-2000throughAY 2024-2025 Q4 - Campus-based:
AY 2001-2002throughAY 2023-2024 - Direct loans:
AY 1999-2000throughAY 2024-2025 Q4 - FFEL:
AY 1999-2000throughAY 2009-2010 Q4
Quarterly families are downloaded in full for provenance, but panel construction only uses the cumulative Q4 workbook for complete award years. AY 2025-2026 is intentionally excluded from panel outputs until Q4 exists.
Large raw files and panel artifacts live outside the repo:
export FSA_ROOT=/Users/markjaysonfarol13/Projects/FSAVolumeReports_PanelingThe pipeline creates:
Raw_Title_IV_Reports/Cross_sections/Dictionary/Panels/Checks/build/
Run the full pipeline:
python3 Scripts/00_run_all.py --root "$FSA_ROOT" --run-qaqcOr use the repo shell wrapper for a safer live run with venv bootstrap, dependency install, unittest smoke test, strict source preflight, and then the full pipeline:
bash Scripts/run_live_pipeline.shRun only the source preflight and inspect the exact files that would be selected:
python3 Scripts/00_run_all.py --root "$FSA_ROOT" --preflight-onlyWrapper version:
bash Scripts/run_live_pipeline.sh --preflight-onlyOr call stage 01 directly:
python3 Scripts/01_download_title_iv_reports.py --root "$FSA_ROOT" --verify-onlyThe preflight writes:
Checks/download_qc/preflight_release_inventory.csvChecks/download_qc/preflight_selected_panel_files.csvChecks/download_qc/preflight_inventory_summary.csvChecks/download_qc/preflight_validation.csv
By default, the pipeline is strict: it will fail before downloading if the live source inventory does not match the expected panel scope.
Run individual stages:
python3 Scripts/01_download_title_iv_reports.py --root "$FSA_ROOT"
python3 Scripts/02_profile_workbooks.py --root "$FSA_ROOT"
python3 Scripts/03_build_dictionary.py --root "$FSA_ROOT"
python3 Scripts/04_panelize_grants.py --root "$FSA_ROOT"
python3 Scripts/05_panelize_campus_based.py --root "$FSA_ROOT"
python3 Scripts/06_panelize_loans.py --root "$FSA_ROOT"
python3 Scripts/07_merge_fsa_panels.py --root "$FSA_ROOT"
python3 Scripts/08_build_panel_dictionary.py --root "$FSA_ROOT"
python3 Scripts/09_build_manual_review_workbook.py --root "$FSA_ROOT"
python3 Scripts/QA_QC/00_source_qaqc.py --root "$FSA_ROOT"
python3 Scripts/QA_QC/01_panel_qaqc.py --root "$FSA_ROOT"
python3 Scripts/QA_QC/02_acceptance_audit.py --root "$FSA_ROOT"For offline testing, stage 01 also accepts --page-html /path/to/title_iv_page.html.
For reduced fixture tests, you can bypass strict live-scope validation with --no-strict-source-checks.
Panels/grants/panel_grant_volume_1999_2025.parquetPanels/campus_based/panel_campus_based_volume_2001_2024.parquetPanels/loans/panel_direct_loan_volume_1999_2025.parquetPanels/loans/panel_ffel_loan_volume_1999_2010.parquetPanels/loans/panel_loan_volume_1999_2025.parquetPanels/final/fsa_volume_reports_raw_1999_2025.parquetPanels/final/fsa_volume_reports_clean_1999_2025.parquetDictionary/fsa_volume_dictionary.parquetDictionary/fsa_volume_dictionary.csvChecks/panel_qc/manual_review_package/final_descriptor_manual_review_workbook.xlsxChecks/panel_qc/manual_review_package/*.csv
The pipeline writes auditable checks for:
- release inventory and selected panel files
- workbook profiling and header signatures
- unmapped actionable headers
- duplicate
opeid8+award_yearkeys - year coverage gaps inside the selected official inventory
- descriptor conflicts in the merged final panel
- top-level acceptance checks
The repo currently uses unittest:
python3 -m unittest discover -s tests -v