Releases: valehdba/pgclone
v4.0.0 — Schema Namespace
pgclone schema.
What Changed
The pgclone schema is created automatically when you run CREATE EXTENSION pgclone. All functions have been moved into this schema and the pgclone_ prefix has been removed:
| Before (v3.x) | After (v4.0.0) |
|---|---|
| pgclone_table(...) | pgclone.table(...) |
| pgclone_schema(...) | pgclone.schema(...) |
| pgclone_database(...) | pgclone.database(...) |
| pgclone_database_create(...) | pgclone.database_create(...) |
| pgclone_table_async(...) | pgclone.table_async(...) |
| pgclone_schema_async(...) | pgclone.schema_async(...) |
| pgclone_progress(...) | pgclone.progress(...) |
| pgclone_cancel(...) | pgclone.cancel(...) |
| pgclone_resume(...) | pgclone.resume(...) |
| pgclone_jobs() | pgclone.jobs() |
| pgclone_clear_jobs() | pgclone.clear_jobs() |
| pgclone_progress_detail() | pgclone.progress_detail() |
| pgclone_jobs_view | pgclone.jobs_view |
| pgclone_discover_sensitive(...) | pgclone.discover_sensitive(...) |
| pgclone_mask_in_place(...) | pgclone.mask_in_place(...) |
| pgclone_create_masking_policy(...) | pgclone.create_masking_policy(...) |
| pgclone_drop_masking_policy(...) | pgclone.drop_masking_policy(...) |
| pgclone_clone_roles(...) | pgclone.clone_roles(...) |
| pgclone_verify(...) | pgclone.verify(...) |
| pgclone_masking_report(...) | pgclone.masking_report(...) |
| pgclone_table_ex(...) | pgclone.table_ex(...) |
| pgclone_schema_ex(...) | pgclone.schema_ex(...) |
| pgclone_functions(...) | pgclone.functions(...) |
| pgclone_version() | pgclone.version() |
Upgrade Path
This is a breaking change. To upgrade from v3.x:
DROP EXTENSION pgclone;
CREATE EXTENSION pgclone;
SELECT pgclone.version();
All application queries must be updated to use the pgclone. prefix.
Quick Start
CREATE EXTENSION pgclone;SELECT pgclone.table(
'host=source-server dbname=mydb user=postgres password=secret',
'public', 'customers', true
);
SELECT pgclone.schema(
'host=source-server dbname=mydb user=postgres password=secret',
'sales', true
);
Compatibility
- PostgreSQL 14, 15, 16, 17, 18
- No C API changes — only SQL function names affected
- All 115 tests pass across all supported versions
v3.6.0 - GDPR/Compliance Masking
Generate audit reports proving that sensitive data is properly masked in your cloned databases. Essential for GDPR, HIPAA, and SOX compliance.
Usage
SELECT * FROM pgclone_masking_report('public');
schema_name | table_name | column_name | sensitivity | mask_status | recommendation
-------------+------------+-------------+--------------+---------------+--------------------------------------
public | employees | full_name | PII - Name | UNMASKED | Apply mask strategy: name
public | employees | email | Email | UNMASKED | Apply mask strategy: email
public | employees | phone | Phone | UNMASKED | Apply mask strategy: phone
public | employees | salary | Financial | UNMASKED | Apply mask strategy: random_int
public | employees | ssn | National ID | UNMASKED | Apply mask strategy: null
public | users | email | Email | MASKED (view) | OK - masked via users_masked view
public | users | password | Credential | MASKED (view) | OK - masked via users_masked view
Sensitivity Categories
Email, PII - Name, Phone, National ID, Financial, Credential, Address, Date of Birth, Credit Card, IP Address — ~40 patterns across 10 categories.
Mask Status
| Status | Meaning |
|---|---|
| MASKED (view) | A _masked view exists — data is protected via dynamic masking |
| UNMASKED | Sensitive column has no masking — action needed |
Compliance Workflow
-- 1. Clone production data SELECT pgclone_database('host=prod dbname=myapp user=postgres', true);-- 2. Find unmasked PII
SELECT * FROM pgclone_masking_report('public') WHERE mask_status = 'UNMASKED';-- 3. Apply masking
SELECT pgclone_create_masking_policy('public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}', 'dba_team');
-- 4. Confirm — all sensitive columns now masked
SELECT * FROM pgclone_masking_report('public');
Also in This Release
- Refactored sensitivity rules into shared
SensitivityRulestruct andpgclone_match_sensitivity()helper — used by bothpgclone_discover_sensitiveandpgclone_masking_report
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.6.0
v3.5.0 — Clone Verification
Compare row counts between source and target databases after cloning to verify completeness.
Usage
-- Verify a specific schema
SELECT * FROM pgclone_verify(
'host=source-server dbname=prod user=postgres',
'app_schema'
);
schema_name | table_name | source_rows | target_rows | match
-------------+--------------+-------------+-------------+--------------
app_schema | customers | 15230 | 15230 | ✓
app_schema | orders | 148920 | 148920 | ✓
app_schema | payments | 98100 | 97855 | ✗
app_schema | audit_log | 1204500 | 0 | ✗ (missing)
-- Verify all schemas
SELECT * FROM pgclone_verify(
'host=source-server dbname=prod user=postgres'
);
Match Indicators
| Indicator | Meaning |
|---|---|
| ✓ | Row counts are equal |
| ✗ | Row counts differ |
| ✗ (missing) | Table exists on source but not on target |
How It Works
- Uses
pg_class.reltuplesfor fast approximate counts — no full table scans needed - Connects to both source (via libpq) and local (via loopback) in a single call
- Returns a standard PostgreSQL table result — can be filtered, sorted, joined
- Works with regular and partitioned tables
Typical Workflow
-- 1. Clone a schema SELECT pgclone_schema('host=prod dbname=myapp user=postgres', 'sales', true);-- 2. Verify the clone
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales');
-- 3. Check for mismatches
SELECT * FROM pgclone_verify('host=prod dbname=myapp user=postgres', 'sales')
WHERE match != '✓';
What's New
pgclone_verify(conninfo)— all user schemaspgclone_verify(conninfo, schema)— single schema- SET-RETURNING FUNCTION using SRF pattern with
VerifyStateinmulti_call_memory_ctx - 5 new pgTAP tests (84 total, 23 test groups, 22 C functions)
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.5.0
v3.4.0 — Clone Roles with Permissions and Passwords
Clone database roles from a source PostgreSQL instance with their encrypted passwords, attributes, memberships, and all privilege grants.
Usage
-- Clone all non-system roles SELECT pgclone_clone_roles( 'host=prod dbname=myapp user=postgres password=secret' ); -- OK: 8 roles created, 2 roles updated, 45 grants applied-- Clone specific roles (comma-separated)
SELECT pgclone_clone_roles(
'host=prod dbname=myapp user=postgres password=secret',
'app_user, reporting_user, api_service'
);
-- OK: 3 roles created, 0 roles updated, 18 grants applied
-- Clone a single role
SELECT pgclone_clone_roles(
'host=prod dbname=myapp user=postgres password=secret',
'app_user'
);
-- OK: 1 roles created, 0 roles updated, 7 grants applied
What Gets Cloned
| Category | Details |
|---|---|
| Role attributes | LOGIN, SUPERUSER, CREATEDB, CREATEROLE, REPLICATION, INHERIT, CONNECTION LIMIT, VALID UNTIL |
| Passwords | Encrypted password (SCRAM-SHA-256 / md5 hash) copied as-is from pg_authid |
| Role memberships | GRANT role TO role relationships |
| Schema privileges | USAGE, CREATE |
| Table privileges | SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER |
| Sequence privileges | USAGE, SELECT, UPDATE |
| Function privileges | EXECUTE on functions and procedures |
Existing Role Behavior
If a role already exists on the target — no DROP, no recreate:
- Password is updated to match source (hash preserved)
- Role attributes (LOGIN, CREATEDB, etc.) synced via ALTER ROLE
- Permissions applied additively (existing grants not revoked)
Requirements
- Superuser on both source and target (
pg_authidaccess) - System roles (
pg_*) andpostgresare excluded automatically
Bug Fixes in This Release
PASSWORDinstead ofENCRYPTED PASSWORDto correctly preserve pre-hashed SCRAM/md5 passwordsaclexplode()schema privilege query was calling the function twice (SELECT + LATERAL) causing duplicate grants- Function EXECUTE grant counter was incrementing even on failure
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.4.0
v3.3.0: Dynamic data masking via views
Role-based masking policies that preserve original data while presenting masked views to unprivileged users. Unlike static masking (v3.2.0), dynamic masking keeps the base table intact — different roles see different data.
New Functions
Create a masking policy
SELECT pgclone_create_masking_policy(
'public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null",
"salary": {"type": "random_int", "min": 40000, "max": 200000}}',
'dba_team'
);
This does four things:
- Creates view
public.employees_maskedwith mask expressions - Revokes
SELECTonpublic.employeesfromPUBLIC - Grants
SELECTonpublic.employees_maskedtoPUBLIC - Grants
SELECTonpublic.employeestodba_team
Drop a masking policy
SELECT pgclone_drop_masking_policy('public', 'employees');
Drops the masked view and restores base table access to PUBLIC.
How It Works
┌─────────────────────────────────────────────────────┐
│ Base Table │
│ employees: Alice, alice@example.com, 95000, 123-.. │
├─────────────────────────────────────────────────────┤
│ ▲ SELECT granted │
│ │ to dba_team only │
│ │
│ ┌─────────────────────────────────────────┐ │
│ │ Masked View │ │
│ │ employees_masked: │ │
│ │ XXXX, a***@example.com, 87432, NULL │ │
│ ├─────────────────────────────────────────┤ │
│ │ SELECT granted to PUBLIC │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Regular users query employees_masked → see anonymized data
dba_team role queries employees directly → see raw data
Complete Masking Toolkit (v3.0–v3.3)
pgclone now offers three complementary masking modes:
| Mode | Function | When to Use |
|---|---|---|
| Clone-time (v3.0.0) | "mask" in options JSON | Data never reaches local disk unmasked |
| Static (v3.2.0) | pgclone_mask_in_place() | Permanently modify existing tables |
| Dynamic (v3.3.0) | pgclone_create_masking_policy() | Preserve originals, role-based access |
Plus pgclone_discover_sensitive() (v3.1.0) to automatically detect which columns need masking.
What's New
pgclone_create_masking_policy(schema, table, mask_json, privileged_role)pgclone_drop_masking_policy(schema, table)- Uses
CREATE OR REPLACE VIEWfor idempotent policy updates - All 8 mask strategies work in dynamic masking
- 7 new pgTAP tests (73 total, 21 test groups)
- 20 C functions, ~4135 lines in pgclone.c
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
DROP EXTENSION IF EXISTS pgclone; CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.3.0
v3.2.0: Static data masking on cloned data
Apply masking to already-cloned local tables via UPDATE — no source connection needed. Useful for sanitizing existing dev/staging databases or applying different mask strategies after an initial clone.
New Function
SELECT pgclone_mask_in_place(
'public', 'employees',
'{"email": "email", "full_name": "name", "ssn": "null"}'
);
-- Returns: OK: masked 1000 rows in public.employees (3 columns)
Same mask JSON format as clone-time masking. All 8 strategies work.
Use Cases
- Mask tables that were cloned without masking
- Apply different mask strategies to an already-cloned table
- Sanitize existing development/staging databases
- Re-mask after schema changes
Bug Fixes
This release also includes 4 bug fixes to the masking engine:
| Bug | Impact | Fix |
|---|---|---|
| random_int range inversion | min > max produced out-of-range values | Swap when inverted |
| random_int zero defaults | Unset min/max always returned 0 | Default to [0, 99999] |
| constant without value | Simple string "constant" produced empty string | Default to "REDACTED" |
| mask_in_place buffer overflow | Large mask JSON silently truncated on stack | Replaced with StringInfo |
| Missing simple-string types | "random_int" and "constant" not recognized as simple strings | Added to parser |
What's New
pgclone_mask_in_place(schema, table, mask_json)SQL function- Reuses
pgclone_build_mask_expr()for consistent behavior across all masking paths - 7 new pgTAP tests (66 total, 20 test groups)
v3.1.0: Auto-discovery of sensitive data
pgclone v3.1.0 — Auto-Discovery of Sensitive Data
Automatically scan a source database schema for columns that look like sensitive data and get suggested masking strategies — ready to paste into your clone command.
New Function
SELECT pgclone_discover_sensitive(
'host=prod dbname=myapp user=readonly',
'public'
);
Returns JSON grouped by table:
{
"employees": {"email": "email", "full_name": "name", "phone": "phone", "salary": "random_int", "ssn": "null"},
"users": {"password": "hash", "api_key": "hash", "ip_address": "hash"}
}
Detection Patterns (~40 rules across 10 categories)
| Category | Detected Patterns |
|---|---|
| email, e_mail | |
| Name | first_name, last_name, full_name, firstname, lastname, *_name |
| Phone | phone, mobile, telephone, fax |
| SSN / National ID | ssn, social_security, national_id, tax_id |
| Financial | salary, income, wage, compensation |
| Credentials | password, passwd, secret, token, api_key, apikey |
| Address | address, street, zip, zipcode, postal |
| Date of Birth | birth, dob, date_of_birth |
| Credit Card | card_number, credit_card, ccn, cvv |
| IP Address | ip_address, ipaddress |
Typical Workflow
-- 1. Discover what needs masking SELECT pgclone_discover_sensitive(conn, 'public');
-- 2. Use the output directly in a masked clone
SELECT pgclone_table(conn, 'public', 'employees', true, 'employees_dev',
'{"mask": {"email": "email", "full_name": "name", "salary": "random_int", "ssn": "null"}}');
What's New
pgclone_discover_sensitive(conninfo, schema)SQL function- Case-insensitive pattern matching against column names
- Scans regular tables and partitioned tables
- 6 new pgTAP tests (59 total, 19 test groups)
v3.0.0: Data masking / anonymization engine
pgclone v3.0.0 — Data Masking / Anonymization Engine
Clone PostgreSQL tables with column-level data anonymization applied during transfer. Masking happens server-side as SQL expressions inside COPY (SELECT ...) TO STDOUT — zero row-by-row processing overhead.
8 Masking Strategies
| Strategy | Example Output | NULL Handling |
|---|---|---|
| a***@example.com | Preserves NULL | |
| name | XXXX | Preserves NULL |
| phone | ***-4567 | Preserves NULL |
| partial | Jo***son (configurable prefix/suffix) | Preserves NULL |
| hash | 5d41402abc4b2a76b9719d911017c592 (MD5, deterministic) | Preserves NULL |
| null | NULL | Always NULL |
| random_int | Random in [min, max] | Always produces value |
| constant | Fixed replacement value | Always produces value |
Usage
-- Clone with email and name masking SELECT pgclone_table( 'host=prod dbname=myapp user=readonly', 'public', 'users', true, 'users', '{"mask": {"email": "email", "full_name": "name", "ssn": "null"}}' );-- Parameterized masks
SELECT pgclone_table(conn, 'hr', 'employees', true, 'employees_dev',
'{"mask": {"salary": {"type": "random_int", "min": 40000, "max": 200000},
"last_name": {"type": "partial", "prefix": 1, "suffix": 2}}}'
);
-- Combine with WHERE filter
SELECT pgclone_table(conn, 'public', 'users', true, 'users_dev',
'{"mask": {"email": "email"}, "where": "status = ''active''"}'
);
What's New
MaskRulestruct andpgclone_build_mask_expr()for extensible mask strategy architecture- Masking applied server-side in
COPY (SELECT ...) TO STDOUT— no external dependencies - Queries
pg_catalog.pg_attributefor column names when masks active without explicit column list - Fully composable with existing
columns,where,indexes,constraints,triggersoptions test_schema.employeestest fixture with realistic sensitive data- 15 new pgTAP tests (53 total, 18 test groups)
Install
git clone https://github.com/valehdba/pgclone.git
cd pgclone && make && sudo make install
CREATE EXTENSION pgclone;
SELECT pgclone_version(); -- pgclone 3.0.0
v2.2.0 - Worker pool for parallel cloning
What's New
Parallel cloning ("parallel": N) now uses a fixed-size worker pool instead of spawning one background worker per table. This is a major architectural improvement that dramatically reduces resource consumption on large schemas.
How It Works
When you run SELECT pgclone_schema_async(conn, 'sales', true, '{"parallel": 4}'), pgclone now:
- Queries the source for the full table list and populates a shared-memory task queue
- Launches exactly 4 background workers (not one per table)
- Each worker pulls the next unclaimed table from the queue, clones it, then grabs the next
- Faster workers automatically handle more tables (dynamic load balancing)
- Parent job tracks aggregate progress across all workers in real time
Before vs After
| v2.0.0 (per-table workers) | v2.2.0 (worker pool) | |
|---|---|---|
100 tables, parallel=4 |
100 bgworkers launched | 4 bgworkers |
| DB connections | 200 (100 × 2) | 8 (4 × 2) |
| Job slots used | 101 (1 parent + 100 child) | 5 (1 parent + 4 pool) |
max_worker_processes needed |
≥ 100 | ≥ 4 |
| Load balancing | Static (pre-assigned chunks) | Dynamic (work-stealing) |
No API Changes
The SQL interface is identical — same "parallel" JSON option, same pgclone_schema_async() function signature. Existing scripts work without modification.
Changes
- Fixed-size worker pool replaces per-table bgworker spawning
- Exactly N workers launched, each pulling from shared task queue
- Dynamic load balancing: faster workers handle more tables
- Resource usage: O(N) instead of O(tables) for bgworkers/connections
- Explicit worker_job_ids tracking for safe pool finalization
- Max 512 tables per pool operation
- No API changes
Upgrade Notes
- PostgreSQL restart required — shared memory layout has changed
- Requires
shared_preload_libraries = 'pgclone'(same as before for async operations) - Maximum 512 tables per pool operation (
PGCLONE_MAX_POOL_TASKS) - Only one pool-based parallel operation can run at a time per cluster
Bug Fix Included
- Pool worker finalization now uses explicit
worker_job_ids[]tracking instead of sentinel-based slot scanning, preventing potential false matches when other async jobs run concurrently
Installation
sudo make install
# Restart PostgreSQL, then:
ALTER EXTENSION pgclone UPDATE TO '2.2.0';##Supported PostgreSQL Versions
PostgreSQL 14, 15, 16, 17, 18
Full Changelog: https://github.com/valehdba/pgclone/blob/main/CHANGELOG.md
v2.1.3: async bugfixes, clear_jobs, async tests, docs restructure
Bugfixes
- bgw_copy_data: COPY pipeline error handling — failures now logged, connection leak fixed
- WaitForBackgroundWorkerStartup added to all async functions — jobs no longer stuck in pending
- pgclone_schema_async parallel: fixed hardcoded jobs[0] shared memory corruption
- pgclone_schema_async parallel: parent job status transition fix
New
pgclone_clear_jobs()— free completed/cancelled job slots- Async test suite (
test/test_async.sh) CONTRIBUTING.md,SECURITY.md
Documentation Restructured
docs/USAGE.md,docs/ASYNC.md,docs/TESTING.md,docs/ARCHITECTURE.mdCHANGELOG.md— full version history
Full Changelog: v2.1.2...v2.1.3