test(read_only_offline_hard): set default_hostgroup to WHG, not hardcoded 0#5613
Conversation
…oded 0 Part of #5610 — third of four tests from the flake tracking issue. Like pgsql-ssl_keylog-t, this turned out to be a deterministic failure miscategorized as a flake. ## Background: the incomplete hostgroup migration Two commits on v3.0 main set up the stage: 1. 26c5a25 ("Fix read_only test assuming default_hostgroup=0"): fixed the ORIGINAL bug where the test's writer hostgroup was 0 but the CI infra set user.default_hostgroup to 1300. The fix set default_hostgroup=0 at the start of each scenario so the user routing matched the test's servers. 2. 4f9ab49 ("Fix hardcoded hostgroups in TAP tests for mysql84"): introduced TAP_MYSQL8_BACKEND_HG env var so the test's writer/ reader hostgroups could be overridden for mysql84 (WHG=2900, RHG= 2901 in the mysql84-g4 group). init_hostgroups() at file scope reads this env var into the static `WHG` and `RHG` symbols. The second commit updated the `INSERT INTO mysql_servers` calls to use WHG/RHG, but missed the `UPDATE mysql_users SET default_hostgroup=0` added by the first commit. So on mysql84-g4: - insert_mysql_servers_records() puts servers in hostgroup 2900 (WHG) - UPDATE mysql_users SET default_hostgroup=0 points the user at hostgroup 0, which has no servers - the test opens a proxy connection as that user, runs BEGIN, and ProxySQL routes BEGIN to hostgroup 0 — which is empty - 10 s later: "Max connect timeout reached while reaching hostgroup 0" - test aborts at subtest 4 of 18, not a flake Verified by querying the running proxysql during a failing run: SELECT username, default_hostgroup FROM mysql_users; -> root | 0 user | 2900 testuser | 2900 ... `root` is the user the test connects as (cl.root_username). Still at 0 after the setup step tried to make it match. In mysql84-g4 that's wrong. ## Fix Change the two `string_format("UPDATE mysql_users SET default_hostgroup=0 ...")` calls in test_scenario_1 and test_scenario_2 to use the `WHG` static symbol (already populated from TAP_MYSQL8_BACKEND_HG by init_hostgroups() earlier in the same file), so the user's default_hostgroup always matches whatever writer hostgroup the test configured its servers into. Zero behavior change on the legacy infra (where WHG defaults to 0 so the literal value is unchanged); correct behavior on the mysql84 infra (where WHG=2900). Same two-line pattern in both scenario functions. ## Local verification (dogfood of test/README.md §"Debugging a flaky test") 3 consecutive local iterations in mysql84-g4 + TEST_PY_TAP_INCL=test_read_only_actions_offline_hard_servers-t: attempt 1: PASS attempt 2: PASS attempt 3: PASS === read_only_actions: 3/3 === Pre-fix: 0/3 (all hit the "Max connect timeout reached while reaching hostgroup 0 after 10000ms" error and aborted at subtest 4 of 18). Post-fix: 18/18 subtests run and pass on every iteration. ## Not touched here - `pgsql-servers_ssl_params-t` — still stuck on the pgsql monitor not making SSL connections under `use_ssl=1`. Same open question as subtest 7 of pgsql-ssl_keylog-t (skipped in PR #5612). Needs pgsql monitor source analysis, not a test-side fix.
📝 WalkthroughWalkthroughThe test file updates Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Code Review
This pull request updates the test suite to dynamically set the default hostgroup using the WHG variable instead of a hardcoded value, which prevents query timeouts in certain environments. The reviewer suggested refactoring the duplicated update logic in test_scenario_1 and test_scenario_2 into a shared helper function to improve maintainability and reduce code duplication.
| // Set default_hostgroup to match writer_hostgroup (WHG) used in this | ||
| // test. WHG defaults to 0 for the legacy infra but is overridden to | ||
| // 2900 by the mysql84-g4 group via TAP_MYSQL8_BACKEND_HG — see | ||
| // init_hostgroups() above. If we hardcode 0 here, the mysql84 group | ||
| // routes the test's BEGIN query to an empty hostgroup 0 and times | ||
| // out at 10 s. A previous fix (26c5a2572) hardcoded 0 because it | ||
| // predated the mysql84 hostgroup migration in 4f9ab49e7 — this | ||
| // commit finishes that fix by making the default_hostgroup follow | ||
| // WHG at runtime. | ||
| { | ||
| std::string update_user; | ||
| string_format("UPDATE mysql_users SET default_hostgroup=0 WHERE username='%s'", update_user, cl.root_username); | ||
| string_format("UPDATE mysql_users SET default_hostgroup=%d WHERE username='%s'", | ||
| update_user, WHG, cl.root_username); | ||
| MYSQL_QUERY__(proxy_admin, update_user.c_str()); | ||
| MYSQL_QUERY__(proxy_admin, "LOAD MYSQL USERS TO RUNTIME"); | ||
| } |
There was a problem hiding this comment.
This block of code, including the detailed comment, is duplicated in test_scenario_2 (lines 396-411). To improve maintainability and reduce code duplication, consider extracting this logic into a helper function. This would also centralize the logic for updating the default hostgroup, making future changes easier.
For example, you could create a function like this:
int update_default_hostgroup(MYSQL* proxy_admin, const CommandLine& cl) {
// Set default_hostgroup to match writer_hostgroup (WHG) used in this
// test. WHG defaults to 0 for the legacy infra but is overridden to
// 2900 by the mysql84-g4 group via TAP_MYSQL8_BACKEND_HG — see
// init_hostgroups() above. If we hardcode 0 here, the mysql84 group
// routes the test's BEGIN query to an empty hostgroup 0 and times
// out at 10 s. A previous fix (26c5a2572) hardcoded 0 because it
// predated the mysql84 hostgroup migration in 4f9ab49e7 — this
// commit finishes that fix by making the default_hostgroup follow
// WHG at runtime.
std::string update_user;
string_format("UPDATE mysql_users SET default_hostgroup=%d WHERE username='%s'",
update_user, WHG, cl.root_username);
if (mysql_query(proxy_admin, update_user.c_str())) {
fprintf(stderr, "File %s, line %d, Error: %s\n", __FILE__, __LINE__, mysql_error(proxy_admin));
return EXIT_FAILURE;
}
if (mysql_query(proxy_admin, "LOAD MYSQL USERS TO RUNTIME")) {
fprintf(stderr, "File %s, line %d, Error: %s\n", __FILE__, __LINE__, mysql_error(proxy_admin));
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}Then, you can replace the duplicated blocks in test_scenario_1 and test_scenario_2 with:
if (update_default_hostgroup(proxy_admin, cl) != EXIT_SUCCESS) {
goto cleanup;
}This avoids using the MYSQL_QUERY__ macro which relies on a goto cleanup statement, making the helper function more self-contained.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp (1)
226-234: Optional: reduce duplicated migration rationale/setup comments.The long explanatory block is duplicated in both scenarios. Consider extracting a small helper (or a shared comment near a helper) to keep this test easier to maintain.
Also applies to: 396-404
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp` around lines 226 - 234, The large duplicated migration/setup comment about default_hostgroup and WHG should be centralized: remove the repeated block in the test and replace it with a short one-line pointer, then add the full explanation once near the shared helper or the init_hostgroups() helper where TAP_MYSQL8_BACKEND_HG and writer_hostgroup behavior is defined (or next to the mysql84-g4/WHG explanation). Update occurrences referencing default_hostgroup and writer_hostgroup to refer to that shared comment/helper so the rationale is maintained without duplication.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp`:
- Around line 226-234: The large duplicated migration/setup comment about
default_hostgroup and WHG should be centralized: remove the repeated block in
the test and replace it with a short one-line pointer, then add the full
explanation once near the shared helper or the init_hostgroups() helper where
TAP_MYSQL8_BACKEND_HG and writer_hostgroup behavior is defined (or next to the
mysql84-g4/WHG explanation). Update occurrences referencing default_hostgroup
and writer_hostgroup to refer to that shared comment/helper so the rationale is
maintained without duplication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c3250360-11cf-4bba-98f6-dbb247d0219c
📒 Files selected for processing (1)
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
Use UPPER_SNAKE_CASE for constants and macros
Files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
test/tap/tests/**/*(test_*.cpp|*-t.cpp)
📄 CodeRabbit inference engine (CLAUDE.md)
Test files follow the naming pattern
test_*.cppor*-t.cpp
Files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
test/tap/**/*.cpp
📄 CodeRabbit inference engine (CLAUDE.md)
Use TAP (Test Anything Protocol) for tests with Docker-based backend infrastructure
Files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
🧠 Learnings (3)
📚 Learning: 2026-04-01T21:27:03.216Z
Learnt from: wazir-ahmed
Repo: sysown/proxysql PR: 5557
File: test/tap/tests/unit/gtid_set_unit-t.cpp:14-17
Timestamp: 2026-04-01T21:27:03.216Z
Learning: In ProxySQL's unit test directory (test/tap/tests/unit/), test_globals.h and test_init.h are only required for tests that depend on the ProxySQL runtime globals/initialization (i.e., tests that exercise components linked against libproxysql.a). Pure data-structure or utility tests (e.g., ezoption_parser_unit-t.cpp, gtid_set_unit-t.cpp, gtid_trxid_interval_unit-t.cpp) only need tap.h and the relevant project header — omitting test_globals.h and test_init.h is correct and intentional in these cases.
Applied to files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).
Applied to files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.
Applied to files:
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp
🔇 Additional comments (2)
test/tap/tests/test_read_only_actions_offline_hard_servers-t.cpp (2)
226-238: Good fix:default_hostgroupnow follows runtimeWHGin scenario 1.This correctly aligns user routing with the writer hostgroup configured for the test environment and preserves legacy behavior when
WHG=0.
396-408: Good fix: scenario 2 now consistently usesWHGfordefault_hostgroup.Nice consistency with scenario 1; this removes the hardcoded hostgroup mismatch that was causing deterministic failures on mysql84-g4.



Part of #5610 — third of four tests from the flake tracking issue.
Correction to #5610's premise: like
pgsql-ssl_keylog-tin PR #5612, this turned out to be a deterministic failure miscategorized as a flake. Reproduced 0/3 failing locally before the fix, 3/3 passing after. Not a timing race.Root cause — an incomplete hostgroup migration
Two commits on v3.0 main set up the stage:
26c5a2572("Fix read_only test assumingdefault_hostgroup=0"): fixed the ORIGINAL bug where the test's writer hostgroup was 0 but the CI infra setuser.default_hostgroupto 1300. The fix ranUPDATE mysql_users SET default_hostgroup=0at the start of each scenario to point the user at the test's servers.4f9ab49e7("Fix hardcoded hostgroups in TAP tests for mysql84"): introducedTAP_MYSQL8_BACKEND_HGenv var so the test's writer/reader hostgroups could be overridden —WHG=2900, RHG=2901in themysql84-g4group.init_hostgroups()at file scope reads this env var into the file-scopeWHG/RHGstatic symbols.The second commit updated the
INSERT INTO mysql_serverscalls to useWHG/RHG— but missed theUPDATE mysql_users SET default_hostgroup=0added by the first commit. So onmysql84-g4:insert_mysql_servers_records()puts servers in hostgroup 2900 (WHG)UPDATE mysql_users SET default_hostgroup=0still runs, pointing the user at hostgroup 0, which has no serversBEGIN, and ProxySQL routesBEGINto hostgroup 0 — emptyMax connect timeout reached while reaching hostgroup 0 after 10000msVerified in the running proxysql during a failing run:
Fix
Change the two
string_format("UPDATE mysql_users SET default_hostgroup=0 ...")calls (intest_scenario_1andtest_scenario_2) to use theWHGstatic symbol — already populated fromTAP_MYSQL8_BACKEND_HGbyinit_hostgroups()earlier in the same file. Zero behavior change on the legacy infra (whereWHGdefaults to 0, so the literal value is unchanged); correct behavior on the mysql84 infra (whereWHG=2900).Diff is ~10 lines of real change plus a comment explaining the history so the next maintainer doesn't hit this again.
Local verification
3 consecutive iterations in
mysql84-g4 + TEST_PY_TAP_INCL=test_read_only_actions_offline_hard_servers-t:Pre-fix: 0/3 — every run hit
Max connect timeout reached while reaching hostgroup 0 after 10000msand aborted at subtest 4 of 18.Post-fix: all 18 subtests run and pass on every iteration. Test log confirms
ok 18is reached.Test plan
c++ -ccompiles cleanly (nostring_formattemplate issues with the newWHGint argument)mysql84-g4, all 18 subtests executed per runlegacy-g4(whereWHGdefaults to 0 and the produced SQL is identical to the pre-fix literal)CI-mysql84-g4against this PR's HEADCI-legacy-g4against this PR's HEAD (defensive check)Related
test_flush_logs-tpoll-instead-of-sleep. The only actually flaky test of the four.pgsql-ssl_keylog-tpath + regex fixes. Ships alongside this PR.pgsql-servers_ssl_params-tand subtest 7 ofpgsql-ssl_keylog-tboth fail under the same underlying pgsql monitor behavior (monitor doesn't make SSL connections underUPDATE pgsql_servers SET use_ssl=1). That needs proxysql source-level investigation and is out of scope for these test-side fixes.Summary by CodeRabbit