Steps to reproduce
- Deploy charmed-postgresql with 3 units (1 leader, 2 replicas)
- Allow a replica to fall behind such that a reinit is needed (e.g. due to a large timeline gap)
- Run
patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml reinit postgresql <member>
- Monitor
/var/snap/charmed-postgresql/common/var/log/patroni/patroni.log
Expected behavior
If reinit fails, the log should clearly report the failure reason so the operator can act on it.
Actual behavior
Patroni enters a silent retry loop. The only output visible in the current log file is:
WARNING: Retry got exception: connection problems
WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
INFO: restarting after failure in progress
Possible root cause
A possible root cause (OSError: [Errno 16] Device or resource busy on the pg_wal bind mount) is only visible buried in rotated patroni.log.N files. There is no clear failure message, no actionable guidance, and the misleading "connection problems" warning sends operators investigating networking rather than the reinit failure itself.
The OSError occurs because /var/snap/charmed-postgresql/common/data/logs is a bind mount used as the pg_wal directory. Patroni's reinit calls shutil.rmtree() which attempts to rename this path to *.failed — blocked by the kernel as a cross-mount rename (EBUSY). This is snap-specific but the core problem is that the error is never surfaced in a visible log line.
Versions
Operating system: Ubuntu 22.04.5 LTS
Juju CLI: 3.6.14
Juju agent: 3.6.14
Charm revision: postgresql 16/stable rev 952
LXD: N/A
Log output
# What the operator sees in the current patroni.log:
WARNING: Retry got exception: connection problems
INFO: restarting after failure in progress <-- loops indefinitely
# Actual cause, buried in rotated patroni.log.N files:
OSError: [Errno 16] Device or resource busy: '/var/snap/charmed-postgresql/common/data/logs' -> '/var/snap/charmed-postgresql/common/data/logs.failed'
ERROR: Error when fetching backup: pg_basebackup exited with code=1
ERROR: failed to bootstrap from leader 'postgresql-4'
Additional context
Workaround: manually stop Patroni, clear the data directory, run pg_basebackup directly, recreate the pg_wal symlink (ln -s /var/snap/charmed-postgresql/common/data/logs /var/snap/charmed-postgresql/common/var/lib/postgresql/pg_wal), fix ownership of the tablespace target directory (chown -R _daemon_:_daemon_ /var/snap/charmed-postgresql/common/data/temp), then restart Patroni.
Steps to reproduce
patronictl -c /var/snap/charmed-postgresql/current/etc/patroni/patroni.yaml reinit postgresql <member>/var/snap/charmed-postgresql/common/var/log/patroni/patroni.logExpected behavior
If reinit fails, the log should clearly report the failure reason so the operator can act on it.
Actual behavior
Patroni enters a silent retry loop. The only output visible in the current log file is:
Possible root cause
A possible root cause (
OSError: [Errno 16] Device or resource busyon thepg_walbind mount) is only visible buried in rotatedpatroni.log.Nfiles. There is no clear failure message, no actionable guidance, and the misleading "connection problems" warning sends operators investigating networking rather than the reinit failure itself.The
OSErroroccurs because/var/snap/charmed-postgresql/common/data/logsis a bind mount used as thepg_waldirectory. Patroni's reinit callsshutil.rmtree()which attempts to rename this path to*.failed— blocked by the kernel as a cross-mount rename (EBUSY). This is snap-specific but the core problem is that the error is never surfaced in a visible log line.Versions
Operating system: Ubuntu 22.04.5 LTS
Juju CLI: 3.6.14
Juju agent: 3.6.14
Charm revision: postgresql 16/stable rev 952
LXD: N/A
Log output
Additional context
Workaround: manually stop Patroni, clear the data directory, run
pg_basebackupdirectly, recreate thepg_walsymlink (ln -s /var/snap/charmed-postgresql/common/data/logs /var/snap/charmed-postgresql/common/var/lib/postgresql/pg_wal), fix ownership of the tablespace target directory (chown -R _daemon_:_daemon_ /var/snap/charmed-postgresql/common/data/temp), then restart Patroni.