A tkinter GUI tool that scans course PDFs for accessibility compliance and automatically fixes common issues flagged by the University of Washington's accessibility requirements.
| Issue | Detection | Fix |
|---|---|---|
| Untagged PDF | Missing /MarkInfo dictionary |
Adds /MarkInfo{Marked:true} |
| No document structure | Missing /StructTreeRoot |
Adds structure tree for screen reader navigation |
| Image-only pages (scanned docs) | No text operators in content streams | Full OCR via Tesseract |
| Missing or bad title | Empty, missing, or gibberish /Title metadata |
Sets title derived from filename |
PDFs are converted to PDF/A-2 format with full tagging via ocrmypdf, then patched with pikepdf to ensure /MarkInfo and /StructTreeRoot are present.
The GUI shows a color-coded table of all PDFs in the folder:
- Green — already compliant (including known-good files)
- Yellow — needs fixing
- Blue — currently processing
- Red — error during processing
Click Fix Selected to process the selected pending file. Fixed PDFs are saved to the updated/ subfolder.
Requires Python 3.10+ with a micromamba environment.
# Install system dependencies
micromamba install -n myenv -c conda-forge tesseract ghostscript pikepdf
# Install ocrmypdf (pip, since conda-forge has a missing dependency on Windows)
micromamba run -n myenv pip install ocrmypdf- tesseract — OCR engine (installed via conda-forge)
- ghostscript — PDF/A conversion backend
- ocrmypdf — orchestrates OCR and PDF/A tagging
- pikepdf — PDF inspection and metadata patching
- tkinter — GUI (bundled with Python)
micromamba run -n myenv python fix_pdf_accessibility.py- The GUI opens and automatically scans the current folder for PDFs
- Each PDF is inspected and shown in the table with its accessibility status
- Select a row and click Fix Selected to process that file
- Fixed PDFs appear in the
updated/subfolder - Click Open Log to view the detailed processing log
Files listed in the KNOWN_GOOD set in accessibility_workflow.py are always shown as compliant and skipped during processing. Edit this set to add files you've already verified externally:
KNOWN_GOOD = {"Wk1_Janeway_Ch1_Sec1-5.pdf"}Every run appends to accessibility_log.txt with timestamped entries covering:
- Full inspection details for every PDF (pages, text content, tags, title)
- Known-good file acknowledgments
- Fix mode used (OCR vs skip-text), duration, and output size
- Verification pass/fail with specific errors
- Full tracebacks for any failures
Three example PDFs are included in the repo:
| File | Status | Notes |
|---|---|---|
Wk1_Janeway_Ch1_Sec1-5.pdf |
Compliant | Already tagged with proper title — used as reference |
Wk1_Janeway1989.pdf |
Needs fix | Has text but missing tags and has a gibberish title |
Wk1_HerronFreeman_Chapters.pdf |
Needs fix | Scanned images only — requires full OCR |
New PDFs added to the folder are git-ignored by default. Only the three example files are tracked.
