Skip to content

#315 - remove metadata from output pdf files#316

Open
Cubix33 wants to merge 1 commit intofireform-core:mainfrom
Cubix33:remove-metadata-from-output-pdf-file
Open

#315 - remove metadata from output pdf files#316
Cubix33 wants to merge 1 commit intofireform-core:mainfrom
Cubix33:remove-metadata-from-output-pdf-file

Conversation

@Cubix33
Copy link

@Cubix33 Cubix33 commented Mar 21, 2026

Closes #315

This PR adds a privacy-scrubbing step to the PDF generation pipeline. It ensures that hidden information from the original template (like the Author's name or the software used to create the form) is removed before the final report is saved.

Why is this needed?

Emergency incident reports are legal documents. Leaving a developer's name or "Google Docs Renderer" in the internal metadata is unprofessional and can be a privacy risk. This change makes every output PDF anonymous and standardized.

Key Changes

  • Metadata Reset: Replaces the original pdf.Info dictionary with a fresh PdfDict().
  • Standardized Identity: Uses PdfName to set a professional, neutral title and author for the document.
  • Library Accuracy: Implements the fix using the correct pdfrw object types to prevent internal writer errors.

How to Test

  1. Run a form-filling process (e.g., make exec).
  2. Create a file in the root for checking metadata :
from pypdf import PdfReader

# Point this to your input template or your output filled PDF
reader = PdfReader("./src/inputs/file_20260321_210542_filled.pdf") 

print("--- PDF Metadata ---")
for key, value in reader.metadata.items():
    print(f"{key}: {value}")
  1. Verify the metadata contains only the auto-generated title we manually set in filler.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PRIVACY]: Strip Sensitive Metadata and Creator Signatures from Output PDFs

1 participant