Skip to content

[bitnami/postgres-ha] failover and sync replication not working #33489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RMrenex opened this issue May 7, 2025 · 0 comments
Open

[bitnami/postgres-ha] failover and sync replication not working #33489

RMrenex opened this issue May 7, 2025 · 0 comments
Assignees
Labels
tech-issues The user has a technical issue about an application triage Triage is needed

Comments

@RMrenex
Copy link

RMrenex commented May 7, 2025

Name and Version

bitnami/postgres-ha

What architecture are you using?

None

What steps will reproduce the bug?

Hello,

I'm trying to configure my pods to have a master and two replicas knowing that I want one replica in sync mode and the other in async but apparently it's not working and I don't really understand why.

When I start my pods, both are in sync_state quorum, but after the failover test (which does not work), my two replica pods go into sync_state async.

Finally the last problem is that the failover is not working and I don't really understand why my pgpool pod is unable to elect a new leader.

logs example during failover test :

2025-05-07 07:39:32.216: main pid 1: LOG:  node status[0]: 1
2025-05-07 07:39:32.216: main pid 1: LOG:  node status[1]: 2
2025-05-07 07:39:32.216: main pid 1: LOG:  node status[2]: 2
2025-05-07 07:39:56.250: health_check0 pid 189: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:39:56.250: health_check0 pid 189: DETAIL:  Operation now in progress
2025-05-07 07:39:56.250: health_check0 pid 189: LOG:  health check retrying on DB node: 0 (round:1)
2025-05-07 07:39:58.163: child pid 158: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:39:58.163: child pid 158: DETAIL:  Operation now in progress
2025-05-07 07:39:58.163: child pid 158: LOG:  received degenerate backend request for node_id: 0 from pid [158]
2025-05-07 07:39:58.163: child pid 158: LOG:  signal_user1_to_parent_with_reason(0)
2025-05-07 07:39:58.163: child pid 158: FATAL:  failed to create a backend connection
2025-05-07 07:39:58.163: child pid 158: DETAIL:  executing failover on backend
2025-05-07 07:39:58.163: main pid 1: LOG:  Pgpool-II parent process received SIGUSR1
2025-05-07 07:39:58.164: main pid 1: LOG:  Pgpool-II parent process has received failover request
2025-05-07 07:39:58.164: main pid 1: LOG:  === Starting degeneration. shutdown host postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless(5432) ===
2025-05-07 07:39:58.165: main pid 1: LOG:  Restart all children
2025-05-07 07:39:58.171: main pid 1: LOG:  execute command: echo ">>> Failover - that will initialize new primary node search!"
2025-05-07 07:39:58.200: main pid 1: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
>>> Failover - that will initialize new primary node search!
2025-05-07 07:39:58.209: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:39:58.209: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:39:59.218: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:39:59.218: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:00.228: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:00.228: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:01.237: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:01.237: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:01.251: health_check0 pid 189: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-0.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:01.251: health_check0 pid 189: DETAIL:  Operation now in progress
2025-05-07 07:40:01.251: health_check0 pid 189: LOG:  health check retrying on DB node: 0 (round:2)
2025-05-07 07:40:02.248: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:02.248: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:03.257: main pid 1: LOG:  find_primary_node: standby node is 1
2025-05-07 07:40:03.257: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:04.259: main pid 1: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-1.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:04.259: main pid 1: DETAIL:  Operation now in progress
2025-05-07 07:40:04.259: main pid 1: LOG:  find_primary_node: make_persistent_db_connection_noerror failed on node 1
2025-05-07 07:40:04.263: main pid 1: LOG:  find_primary_node: standby node is 2
2025-05-07 07:40:05.265: main pid 1: LOG:  failed to connect to PostgreSQL server on "postgres-ha-postgresql-ha-postgresql-1.postgres-ha-postgresql-ha-postgresql-headless:5432", getsockopt() failed
2025-05-07 07:40:05.265: main pid 1: DETAIL:  Operation now in progress

file values.yaml

global:
  postgresql:
    existingSecret: test-postgresql-auth

postgresql:
  replicaCount: 3
  #pgHbaTrustAll: true
  syncReplication: true
  syncReplicationMode: "ANY"
  numSynchronousReplicas: 1
  synchronousCommit: "on"

  initdbScripts:
    create_test_db.sql: |
      CREATE DATABASE test_db;
    create_pgpool_users.sql: |
      CREATE USER "sr-repmgr" WITH PASSWORD '<myPass>';
      CREATE USER "health-repmgr" WITH PASSWORD '<myPass>';

  livenessProbe:
    enabled: true
    initialDelaySeconds: 30
    periodSeconds: 10
  readinessProbe:
    enabled: true
    initialDelaySeconds: 5
    periodSeconds: 10

pgpool:
  configurationCM: test-pgpool-conf
  usePasswordFiles: true
  existingSecret: test-pgpool-auth

primary:
  service:
    type: ClusterIP
    port: 5432

standby:
  service:
    type: ClusterIP
    port: 5432

Are you using any custom parameters or values?

No response

What is the expected behavior?

The expected behavior is to have one replica in sync mode and another in async mode and that during a failover the sync replica becomes the new leader.

What do you see instead?

Already explained above.

Additional information

CHART NAME: postgresql-ha
CHART VERSION: 15.3.15
APP VERSION: 17.4.0

@RMrenex RMrenex added the tech-issues The user has a technical issue about an application label May 7, 2025
@github-actions github-actions bot added the triage Triage is needed label May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech-issues The user has a technical issue about an application triage Triage is needed
Projects
None yet
Development

No branches or pull requests

2 participants