Skip to content

mysql_aws_aurora_hostgroups: writer_is_also_reader=1 does not place writer in reader hostgroup during steady-state #5592

@templarfelix

Description

@templarfelix

Summary

When writer_is_also_reader=1 is configured in mysql_aws_aurora_hostgroups, the writer instance is not added to the reader_hostgroup during normal steady-state operation. The writer only appears in the reader_hostgroup after a failover event triggers update_aws_aurora_set_writer().

This means that if the only dedicated reader goes down, reader_hostgroup becomes empty and read queries fail — defeating the purpose of writer_is_also_reader=1 as a read fallback.

Environment

  • ProxySQL version: 3.0.7
  • Aurora MySQL: Serverless v2 (8.0.44 compatible)
  • Deployment: Kubernetes StatefulSet (3 replicas) with mysql_aws_aurora_hostgroups autodiscovery
  • Aurora topology: 1 writer + 1 reader (tested), also reproducible with 1 writer + 2 readers

Configuration

-- mysql_aws_aurora_hostgroups
INSERT INTO mysql_aws_aurora_hostgroups (
    writer_hostgroup, reader_hostgroup, active, aurora_port,
    domain_name, max_lag_ms, check_interval_ms, check_timeout_ms,
    writer_is_also_reader, new_reader_weight
) VALUES (
    10, 20, 1, 3306,
    '.abc123def456.us-east-1.rds.amazonaws.com',
    30000, 1000, 3000,
    1, 1
);

-- mysql_servers (seeds in writer hostgroup only)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES
    (10, 'myapp-writer.abc123def456.us-east-1.rds.amazonaws.com', 3306),
    (10, 'myapp-reader.abc123def456.us-east-1.rds.amazonaws.com', 3306);

LOAD MYSQL SERVERS TO RUNTIME;

Expected Behavior

After autodiscovery runs and identifies the writer and reader:

runtime_mysql_servers:
  HG10 (writer):  myapp-writer          ONLINE
  HG20 (reader):  myapp-reader          ONLINE
  HG20 (reader):  myapp-writer          ONLINE   ← writer as fallback reader

The writer should appear in both writer_hostgroup (10) and reader_hostgroup (20), as documented:

"If mysql_aws_aurora_hostgroups.writer_is_also_reader is enabled, then the writer is also configured in the reader hostgroup."

Actual Behavior

runtime_mysql_servers:
  HG10 (writer):  myapp-writer          ONLINE
  HG20 (reader):  myapp-reader          ONLINE

The writer is only in HG10. It does not appear in HG20 despite writer_is_also_reader=1.

Root Cause Analysis

We traced the issue through the source code (MySQL_HostGroups_Manager.cpp):

  1. aws_aurora_replication_lag_action() (line ~6591): During steady-state monitoring, this function only shuns/enables servers based on lag. It does not add or remove servers from hostgroups. It returns a boolean indicating whether the server is in the correct hostgroup.

  2. update_aws_aurora_set_writer() (line ~6775): This function contains the writer_is_also_reader logic that adds the writer to the reader hostgroup. However, it has a short-circuit condition (lines ~6836-6845):

    if (found_writer) {
        if (
            (writer_is_also_reader==0 && found_reader==false)
            ||
            (writer_is_also_reader > 0 && found_reader==true)
        ) {
            // already correct, no action needed
            delete resultset;
            resultset=NULL;
        }
    }

    During initial boot, the writer is found_writer=true (it's a seed in HG10) but found_reader=false (no seed in HG20). With writer_is_also_reader=1, the condition (writer_is_also_reader > 0 && found_reader==true) is false, so the function should proceed to the rebuild phase. However, this function is only called when aws_aurora_replication_lag_action() returns false — i.e., when it detects the server is in the wrong hostgroup. Since the writer IS in the writer hostgroup (HG10), aws_aurora_replication_lag_action() returns true, and update_aws_aurora_set_writer() is never called.

  3. create_new_server_in_hg() (line ~6946): This path correctly handles writer_is_also_reader:

    int wr_res = create_new_server_in_hg(_writer_hostgroup, srv_info, wr_srv_opts);
    if (writer_is_also_reader && read_HG >= 0) {
        rd_res = create_new_server_in_hg(read_HG, srv_info, rd_srv_opts);
    }

    But this path is only taken when the server is completely new (not found in any hostgroup). Since the writer is a seed in HG10, it's already known, so this path is never reached.

The Gap

The writer_is_also_reader check exists in two code paths:

  • update_aws_aurora_set_writer(): Has the logic but is only called during failover (topology change)
  • create_new_server_in_hg(): Has the logic but is only called for new servers (not seeds)

Neither path is reached during steady-state operation when the writer is already a seed in the writer hostgroup. The monitoring loop (aws_aurora_replication_lag_action()) sees the writer in HG10, considers it correct, and never triggers either function.

Workaround

Pre-seed the writer in both hostgroups:

INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES
    (10, 'myapp-writer.abc123def456.us-east-1.rds.amazonaws.com', 3306),
    (20, 'myapp-writer.abc123def456.us-east-1.rds.amazonaws.com', 3306);

This is fragile because the writer identity can change after a failover, requiring the seed configuration to be updated.

Suggested Fix

After the initial autodiscovery cycle identifies the writer, check if writer_is_also_reader > 0 and the writer is missing from the reader_hostgroup. If so, call the insertion logic from update_aws_aurora_set_writer() or create_new_server_in_hg() to add the writer to the reader hostgroup.

This could be done in evaluate_aws_aurora_results() after the topology is determined, or as an additional check in aws_aurora_replication_lag_action().

Impact

  • Read queries fail when the only dedicated reader goes down, even though writer_is_also_reader=1 was specifically configured to prevent this
  • The documented behavior does not match the actual behavior
  • Users must use fragile workarounds (pre-seeding writer in reader hostgroup) that break after failover

Steps to Reproduce

  1. Configure mysql_aws_aurora_hostgroups with writer_is_also_reader=1
  2. Seed only the writer hostgroup in mysql_servers
  3. LOAD MYSQL SERVERS TO RUNTIME
  4. Wait for autodiscovery to complete (10+ seconds)
  5. Check runtime_mysql_servers — writer will only be in the writer hostgroup

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions