A local-only, automated workflow for "surgical" image redaction. This tool uses OCR to identify sensitive text and masks only the specific words you care about, preserving the rest of the document's context.
- OCR Text Extraction: Uses Tesseract to extract text and precise bounding box coordinates from images.
- Surgical Matching: Matches extracted text against your
custom_rules.txt(names, addresses, account numbers, etc.). - Local Redaction: Uses the seemenot tool to draw high-quality redaction masks based on a generated manifest.
No cloud APIs are used—everything runs locally on your machine.
- Tesseract OCR: Required for text extraction.
brew install tesseract
- seemenot: The core redaction engine.
# Follow installation at https://github.com/waldekmastykarz/seemenot
- Clone this repository.
- Create a Python virtual environment and install dependencies:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt - Configure your rules:
- Copy
custom_rules.example.txttocustom_rules.txt. - Add the names, addresses, or IDs you want to redact.
cp custom_rules.example.txt custom_rules.txt
- Copy
- Place images (JPG, PNG) in the
input/directory. - Run the automation script:
./scripts/process_images.sh
- Check the results:
redacted/: Contains the masked images.processed/: Original images are moved here after successful redaction.logs/: Detailed processing logs.
- Redaction engine: seemenot by Waldek Mastykarz.
- OCR: Tesseract OCR.