[bitnami/postgres-ha] failover and sync replication not working #33489

RMrenex · 2025-05-07T08:35:48Z

Name and Version

bitnami/postgres-ha

What architecture are you using?

None

What steps will reproduce the bug?

Hello,

I'm trying to configure my pods to have a master and two replicas knowing that I want one replica in sync mode and the other in async but apparently it's not working and I don't really understand why.

When I start my pods, both are in sync_state quorum, but after the failover test (which does not work), my two replica pods go into sync_state async.

Finally the last problem is that the failover is not working and I don't really understand why my pgpool pod is unable to elect a new leader.

logs example during failover test :

2025-05-07 07:39:32.216: main pid 1: LOG:  node status[0]: 1
2025-05-07 07:39:32.216: main pid 1: LOG:  node status[1]: 2
2025-05-07 07:39:32.216: main pid 1: LOG:  node status[2]: 2
2025-05-07 07:39:56.250: health_check0 pid 189: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:39:56.250: health_check0 pid 189: DETAIL:  Operation now in progress
2025-05-07 07:39:56.250: health_check0 pid 189: LOG:  health check retrying on DB node: 0 (round:1)
2025-05-07 07:39:58.163: child pid 158: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:39:58.163: child pid 158: DETAIL:  Operation now in progress
2025-05-07 07:39:58.163: child pid 158: LOG:  received degenerate backend request for node_id: 0 from pid [158]
2025-05-07 07:39:58.163: child pid 158: LOG:  signal_user1_to_parent_with_reason(0)
2025-05-07 07:39:58.163: child pid 158: FATAL:  failed to create a backend connection
2025-05-07 07:39:58.163: child pid 158: DETAIL:  executing failover on backend
2025-05-07 07:39:58.163: main pid 1: LOG:  Pgpool-II parent process received SIGUSR1
2025-05-07 07:39:58.164: main pid 1: LOG:  Pgpool-II parent process has received failover request
2025-05-07 07:39:58.164: main pid 1: LOG:  === Starting degeneration. shutdown host postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless(5432) ===
2025-05-07 07:39:58.165: main pid 1: LOG:  Restart all children
2025-05-07 07:39:58.171: main pid 1: LOG:  execute command: echo ">>> Failover - that will initialize new primary node search!"
2025-05-07 07:39:58.200: main pid 1: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
>>> Failover - that will initialize new primary node search!
2025-05-07 07:39:58.209: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:39:58.209: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:39:59.218: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:39:59.218: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:00.228: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:00.228: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:01.237: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:01.237: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:01.251: health_check0 pid 189: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:01.251: health_check0 pid 189: DETAIL:  Operation now in progress
2025-05-07 07:40:01.251: health_check0 pid 189: LOG:  health check retrying on DB node: 0 (round:2)
2025-05-07 07:40:02.248: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:02.248: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:03.257: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:03.257: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:04.259: main pid 1: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-1.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:04.259: main pid 1: DETAIL:  Operation now in progress
2025-05-07 07:40:04.259: main pid 1: LOG:  find_primary_node: make_persistent_db_connection_noerror failed on node 1
2025-05-07 07:40:04.263: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:05.265: main pid 1: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-1.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:05.265: main pid 1: DETAIL:  Operation now in progress

file values.yaml

global:
  postgresql:
    existingSecret: test-postgresql-auth

postgresql:
  replicaCount: 3
  #pgHbaTrustAll: true
  syncReplication: true
  syncReplicationMode: "ANY"
  numSynchronousReplicas: 1
  synchronousCommit: "on"

  initdbScripts:
    create_test_db.sql: |
      CREATE DATABASE test_db;
    create_pgpool_users.sql: |
      CREATE USER "sr-repmgr" WITH PASSWORD '<myPass>';
      CREATE USER "health-repmgr" WITH PASSWORD '<myPass>';

  livenessProbe:
    enabled: true
    initialDelaySeconds: 30
    periodSeconds: 10
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    periodSeconds: 10

pgpool:
  configurationCM: test-pgpool-conf
  usePasswordFiles: true
  existingSecret: test-pgpool-auth

primary:
  service:
    type: ClusterIP
    port: 5432

standby:
  service:
    type: ClusterIP
    port: 5432

Are you using any custom parameters or values?

No response

What is the expected behavior?

The expected behavior is to have one replica in sync mode and another in async mode and that during a failover the sync replica becomes the new leader.

What do you see instead?

Already explained above.

Additional information

CHART NAME: postgresql-ha
CHART VERSION: 15.3.15
APP VERSION: 17.4.0

The text was updated successfully, but these errors were encountered:

RMrenex added the tech-issues The user has a technical issue about an application label May 7, 2025

github-actions bot added the triage Triage is needed label May 7, 2025

github-actions bot assigned carrodher May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/postgres-ha] failover and sync replication not working #33489

[bitnami/postgres-ha] failover and sync replication not working #33489

RMrenex commented May 7, 2025 •

edited

Loading

[bitnami/postgres-ha] failover and sync replication not working #33489

[bitnami/postgres-ha] failover and sync replication not working #33489

Comments

RMrenex commented May 7, 2025 • edited Loading

Name and Version

What architecture are you using?

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

Additional information

RMrenex commented May 7, 2025 •

edited

Loading