Skip to content

aadl/evg-db-anon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Evergreen ILS — Anonymized Dump Generator

Produces a sanitized PostgreSQL plain-SQL dump suitable for dev and CI environments. Patron PII is replaced with realistic fake data; large audit/history tables are emptied.

What it does

Category Action
Patron names, email, phones, DOB, ID values Replaced with fake data
Library card barcodes Hashed (deterministic)
Patron usernames Hashed (deterministic)
Patron passwords Replaced with bcrypt hash of demo
Patron messages Replaced with [redacted]
Credit card numbers Replaced with test card 4111111111111111
EDI credentials Replaced with random strings
Vendor contact info (acq.provider) Replaced with fake data
auditor.* (10 history tables) Truncated
action_trigger.event / event_output Truncated
action.usr_circ_history Truncated
action.aged_circulation / aged_hold_request Truncated
money.aged_billing / aged_payment Truncated

Prerequisites

Usage

Using anonymize.sh

./anonymize.sh -i original_dump.sql -o anonymized_dump.sql

Using the Docker image directly

podman run --rm -i \
  -v "$(pwd):/rules:ro" \
  -e ANON_MASKING_POLICIES=/rules/masking_rules.sql \
  ghcr.io/aadl/evg-db-anon:3.0.13 \
  < original_dump.sql \
  > anonymized_dump.sql

Output a gzipped file

Pipe the output through gzip to compress on the fly:

podman run --rm -i \
  -v "$(pwd):/rules:ro" \
  -e ANON_MASKING_POLICIES=/rules/masking_rules.sql \
  ghcr.io/aadl/evg-db-anon:3.0.13 \
  < original_dump.sql \
  | gzip > anonymized_dump.sql.gz

Input from a gzipped file

Decompress on the fly with gunzip -c:

gunzip -c original_dump.sql.gz \
  | podman run --rm -i \
    -v "$(pwd):/rules:ro" \
    -e ANON_MASKING_POLICIES=/rules/masking_rules.sql \
    ghcr.io/aadl/evg-db-anon:3.0.13 \
  > anonymized_dump.sql

Both can be combined to go from a gzipped dump directly to a gzipped anonymized dump:

gunzip -c original_dump.sql.gz \
  | podman run --rm -i \
    -v "$(pwd):/rules:ro" \
    -e ANON_MASKING_POLICIES=/rules/masking_rules.sql \
    ghcr.io/aadl/evg-db-anon:3.0.13 \
  | gzip > anonymized_dump.sql.gz

Generating a plain-SQL dump

pg_dump -Fp -h $PGHOST -U $PGUSER -d $PGDATABASE -f original_dump.sql

Converting an existing dump to plain SQL

If you already have a dump in custom (-Fc) or directory (-Fd) format, convert it with pg_restore. The image includes pg_restore since it is based on postgres:17:

podman run --rm \
  -v "$(pwd):/work" \
  ghcr.io/aadl/evg-db-anon:3.0.13 \
  pg_restore --no-owner --no-acl -f /work/original_dump.sql /work/original_dump.dump

Or if you have pg_restore installed locally:

pg_restore --no-owner --no-acl -f original_dump.sql original_dump.dump
Source format pg_dump flag pg_restore command
Custom -Fc pg_restore --no-owner --no-acl -f out.sql dump.dump
Directory -Fd pg_restore --no-owner --no-acl -f out.sql dump_dir/
Tar -Ft pg_restore --no-owner --no-acl -f out.sql dump.tar

Plain SQL (-Fp) dumps cannot be processed by pg_restore — they can be passed directly to the anonymizer as-is.

Extending the rules

Add SECURITY LABEL FOR anon ON COLUMN schema.table.column IS '...' statements to masking_rules.sql. See the pg_anonymizer masking functions reference for available functions (anon.fake_email(), anon.random_string(), anon.hash(), etc.).

About

Anonymize Evergreen-ILS Dumps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors