CI test fixes: cluster sync, plan counts, test group assignments, and diagnostics by renecannao · Pull Request #5538 · sysown/proxysql

renecannao · 2026-03-23T16:59:07Z

Summary

Fixes multiple TAP test failures and CI infrastructure issues discovered during CI migration testing.

Test fixes

test_tls_stats-t: Fix plan(29) → plan(27) — the test executes 27 ok() calls on the happy path, plan was miscounted
test_query_cache_soft_ttl_pct-t: Add tolerance of +1 to cache timing assertions — the soft TTL refresh race allows two threads to both see "needs refresh" simultaneously
pgsql-transaction_variable_state_tracking-t: Replace superuser-only allow_in_place_tablespaces with user-settable enable_mergejoin (fixes pgsql-transaction_variable_state_tracking-t: permission denied on superuser-only variables #5536)
test_auth_methods-t: Call stop_internal_noise_threads() before early return on MySQL 5.7, preventing crash from joinable std::thread destructors (fixes test_auth_methods-t: crashes with 'terminate called without an active exception' on MySQL 5.7 #5537)
test_cluster_sync-t: Use 127.0.0.1 for locally-spawned replica instead of cl.host (Docker hostname of main container). Set cluster_mysql_servers_sync_algorithm=3 on replica so it uses v2 sync (config-only, ignores monitor-shunned status)

Test group reassignments

reg_test_4935-caching_sha2-t: Remove from legacy-g4 — caching_sha2_password requires MySQL 8.0+
test_auth_methods-t: Remove from legacy-g2 — requires MySQL 8.0+

Cluster sync fixes

TAP_CLUSTER_NODES: Include primary (port 6032) as first entry. test_cluster1-t expects conns[0] to be the primary. (fixes test_cluster1-t: admin_variables fails to sync on satellite nodes (single-container cluster) #5535)
cluster_mysql_servers_sync_algorithm=3: Set on all cluster nodes (primary + nodes). Default (1) syncs runtime mysql_servers including monitor changes, causing circular sync where monitor-triggered changes on nodes propagate back to the primary and corrupt the disk config. Auto_select (3) uses v2 sync when monitor is enabled, syncing only user-configured changes. (fixes test_read_only_actions_offline_hard_servers-t leaves hostgroup 1300 empty, breaking subsequent tests #5540)
Set admin-restapi_enabled and admin-debug on the primary to match node configuration
proxysql_servers duplicate INSERT: Commit b02147b added the primary to the shared PROXYSQL_SERVERS_SQL but didn't remove the pre-existing per-node INSERT, causing a UNIQUE constraint violation on all multi-node CI groups

Coverage infrastructure overhaul

Fixed machine stalls (load >1000, 5GB+ resident per gcov process) and 0% coverage reports in multi-group CI runs. Root causes:

Shared workspace corruption: All test-runner containers wrote TAP test .gcda files to the same ${WORKSPACE} paths. Concurrent gcov processes on the same .gcda files caused memory bloat and data corruption.
- Fix: Set GCOV_PREFIX=/gcov/tap on test-runner containers so TAP .gcda files go to per-INFRA_ID /gcov/tap/ directories.
Per-container fastcov storm: Each container ran fastcov -j$(nproc) in its EXIT trap. With 5+ containers finishing simultaneously, this spawned 300+ gcov processes.
- Fix: Pass MULTI_GROUP=1 from run-multi-group.bash to test-runners. When set, the EXIT trap copies .gcno adjacent to .gcda (prep work) but skips fastcov/genhtml. Coverage collection runs once centrally after all groups finish, sequentially per group with -j4.
Path mismatch → 0% coverage: GCOV_PREFIX_STRIP=3 placed .gcda at proxysql/lib/obj/ but .gcno was copied to lib/obj/ — gcov couldn't correlate them. Additionally, the --include filter expected full workspace paths but gcov records bare filenames.
- Fix: .gcno copy now strips leading path components to find the match. Replaced --include with --exclude /usr deps since project source paths are bare filenames.

libparser.so loading fix

Add LD_LIBRARY_PATH for unit tests (SKIP_PROXYSQL path) and TAP_DEPS_PATH for Docker-based tests — libcpp_dotenv.so transitively needs libparser.so but has stale RUNPATH from the original build

Diagnostics

Add dump_hostgroup_debug() utility function — dumps runtime_mysql_servers and stats_mysql_connection_pool for a given hostgroup. Added to test_throttle, test_warnings, and test_unsupported_queries.

Fixes

test_cluster1-t: admin_variables fails to sync on satellite nodes (single-container cluster) #5535 — test_cluster1-t admin_variables sync failure
pgsql-transaction_variable_state_tracking-t: permission denied on superuser-only variables #5536 — pgsql transaction test permission denied
test_auth_methods-t: crashes with 'terminate called without an active exception' on MySQL 5.7 #5537 — test_auth_methods crash on skip
test_read_only_actions_offline_hard_servers-t leaves hostgroup 1300 empty, breaking subsequent tests #5540 — test_read_only_actions leaves hostgroup 1300 empty (root cause: cluster sync algorithm)
Cluster sync stalls: satellites fail to detect primary's checksum changes #5542 — cluster nodes lose primary monitor thread after proxysql_servers sync

Test plan

Run test_cluster_sync-t in isolation — no longer stalls, replica connects
Run run-multi-group.bash with all groups — no machine stalls during coverage collection
Verify coverage reports show >0% for lib/ and src/ (no /usr or deps)
Verify test_cluster1-t passes (primary in proxysql_servers, no duplicate INSERT)
Verify unit-tests-g1 passes (libparser.so loads correctly)
Verify standalone run-tests-isolated.bash still generates coverage

…ster nodes

- test_tls_stats-t: plan(29) -> plan(27). The test executes 27 ok() calls on the happy path; plan was miscounted during authoring. - reg_test_4935-caching_sha2-t: remove from legacy-g4 (MySQL 5.7 doesn't support caching_sha2_password). Keep in mysql84-g4.

- Add dump_hostgroup_debug() utility: dumps runtime_mysql_servers and stats_mysql_connection_pool for a given hostgroup via diag() output. Helps diagnose "hostgroup has no servers available" CI failures. - test_throttle_max_bytes_per_second_to_client-t: dump hostgroup 1300, 0, and 1 when create_table_test_sbtest1() fails. - test_warnings-t: dump target hostgroup state before running tests. - test_unsupported_queries-t: dump target hostgroup state before running tests. - test_query_cache_soft_ttl_pct-t: add tolerance of +1 to all three assertions. The soft TTL refresh race allows two threads to both see "needs refresh" simultaneously under load, producing one extra backend hit. This is a known property of the soft TTL mechanism.

Test requires MySQL 8.0+ (detects 5.7 and skips, but crashes with 'terminate called without an active exception' during noise thread cleanup). The crash-on-skip is a separate test bug.

- pgsql-transaction_variable_state_tracking-t: replace superuser-only variable allow_in_place_tablespaces with user-settable enable_mergejoin. Fixes #5536. - test_auth_methods-t: call stop_internal_noise_threads() before early return on MySQL 5.7, preventing 'terminate called without an active exception' crash from joinable std::thread destructors. Fixes #5537.

…nodes The primary had admin-debug=false and admin-restapi_enabled=false (defaults), while cluster nodes had them set to true. Since these variables are NOT in the CLUSTER_SYNC_INTERFACES_ADMIN exclusion list, they're included in the checksum. The checksum mismatch between primary and nodes caused admin_variables sync to never converge on satellite nodes. Now the primary sets the same admin-restapi_enabled=true and admin-debug=true as all nodes, ensuring matching checksums. Fixes #5535.

test_cluster1-t expects conns[0] to be the primary ProxySQL instance. TAP_CLUSTER_NODES only listed nodes 1-9 (ports 6042-6122), so the test loop operated on nodes 1-4 instead of primary + nodes 1-3. Node 4 (port 6072) is NOT in proxysql_servers (only core nodes 1-3 are registered), so changes on node 4 were invisible to the cluster — sync could never detect them, causing the 30-second timeout. Now TAP_CLUSTER_NODES starts with proxysql:6032 (primary), matching the old cluster_ports layout where 6032 was the first entry. Fixes #5535.

coderabbitai · 2026-03-23T16:59:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Ensure ProxySQL primary endpoint is first in constructed cluster node lists when nodes exist; snapshot/restore GCOV env around proxysql startup; add a hostgroup debug dump helper and call it from several tests; relax some race assertions; adjust TAP plans and test group JSON; broaden coverage file collection.

Changes

Cohort / File(s)	Summary
Cluster setup & startup scripts `test/infra/control/env-isolated.bash`, `test/infra/control/start-proxysql-isolated.bash`	Prepend `proxysql:6032` to cluster node list when `NUM_CLUSTER_NODES > 0`; keep `CLUSTER_NODES` empty when zero; snapshot `GCOV_PREFIX`/`GCOV_PREFIX_STRIP` into saved vars before unsetting and re-export them before exec; add `admin-cluster_mysql_servers_sync_algorithm=3` to admin config updates.
Coverage collection `test/infra/control/run-tests-isolated.bash`	collect_coverage trap now copies both `.gcno` and `.gcda` into `/gcov` (broadened find/copy logic).
Test helper: hostgroup debug `test/tap/tap/utils.h`, `test/tap/tap/utils.cpp`	Add `void dump_hostgroup_debug(MYSQL* admin, int hostgroup)` which queries and logs `runtime_mysql_servers` and `stats_mysql_connection_pool`; no-op if `admin` is NULL.
Tests invoking debug dump `test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp`, `test/tap/tests/test_unsupported_queries-t.cpp`, `test/tap/tests/test_warnings-t.cpp`	Call `dump_hostgroup_debug()` in diagnostic/failure paths (when admin access is available) to emit hostgroup runtime/stats state.
Replica targeting in cluster tests `test/tap/tests/test_cluster_sync-t.cpp`	Introduce `R_HOST = "127.0.0.1"` and redirect replica ProxySQL admin/debug connections to that host while keeping replica ports.
Thread cleanup on skip `test/tap/tests/test_auth_methods-t.cpp`	On backend-version skip path, call `stop_internal_noise_threads()` before returning to avoid joinable-thread destruction.
Race-tolerance adjustments `test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp`	Relax assertions to accept a small tolerance (1) for concurrent soft-TTL expiry races when counting slow clients, cache hits, and hostgroup hits.
TAP plan & test data changes `test/tap/tests/test_tls_stats-t.cpp`, `test/tap/groups/groups.json`	Reduce TAP plan count from 29→27; remove `legacy-g4`/`legacy-g2` entries in two groups, narrowing lists to `mysql84-*` entries while keeping flags.
PostgreSQL tracked variable `test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp`	Replace superuser-only `allow_in_place_tablespaces` with `enable_mergejoin` to avoid permission failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Run ProxySQL cluster inside single container + CI stability fixes #5528 — Modifies env-isolated.bash cluster-node endpoint construction similarly.
Fix tests for isolated CI environment and enhance diagnostics #5435 — Related TAP infra changes affecting TAP_CLUSTER_NODES construction and env parsing.
Miscellaneous TAP test fixes (2026-02-21) #5404 — Related cluster-sync changes and admin variable handling (sync algorithm adjustments).

Poem

🐰
I hopped into tests at the break of dawn,
Put the primary proxy proudly on the lawn,
Quieted threads that chattered in the night,
Dug up hostgroup whispers for a clearer light,
And left the logs a little more polite.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: CI test fixes covering cluster sync, plan counts, test group assignments, and diagnostics.
Linked Issues check	✅ Passed	All code changes address the specified objectives: `#5536` via pgsql variable replacement, `#5537` via stop_internal_noise_threads() call, `#5535` via TAP_CLUSTER_NODES primary inclusion and admin config alignment.
Out of Scope Changes check	✅ Passed	All changes are directly related to the linked issues: test fixes, group reassignments, cluster sync configuration, GCOV handling, and diagnostic utilities align with PR objectives.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v3.0-ci260323

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-23T16:59:42Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses multiple CI test failures and significantly enhances diagnostic capabilities within the test suite. It includes critical fixes for incorrect test plan counts, introduces necessary tolerance for race conditions in cache timing tests, and resolves issues with test group assignments to ensure compatibility with MySQL versions. Furthermore, the changes correct cluster synchronization logic by properly including and configuring the primary node, and restore accurate code coverage reporting for the primary process. A new debugging utility has been added and integrated into key tests to streamline troubleshooting of hostgroup-related failures.

Highlights

Test Fixes: Corrected the expected plan() count in test_tls_stats-t from 29 to 27 to accurately reflect the number of ok() calls. Added a +1 tolerance to cache timing assertions in test_query_cache_soft_ttl_pct-t to account for soft TTL refresh race conditions. Replaced the superuser-only allow_in_place_tablespaces variable with the user-settable enable_mergejoin in pgsql-transaction_variable_state_tracking-t. Ensured stop_internal_noise_threads() is called before early return in test_auth_methods-t to prevent crashes on MySQL 5.7.
Test Group Reassignments: Removed reg_test_4935-caching_sha2-t and test_auth_methods-t from legacy-gX test groups as they require MySQL 8.0+.
Cluster Sync Fixes: Updated TAP_CLUSTER_NODES to include the primary ProxySQL node (port 6032) as the first entry, addressing test_cluster1-t expectations. Configured admin-restapi_enabled and admin-debug on the primary ProxySQL node to ensure consistent admin variable checksums across the cluster.
Coverage Fix: Restored GCOV_PREFIX for the primary ProxySQL process after it was unset for background cluster nodes, fixing coverage report generation.
Diagnostics: Introduced a new dump_hostgroup_debug() utility function in utils.h/cpp to aid in diagnosing "hostgroup has no servers" failures by dumping runtime_mysql_servers and stats_mysql_connection_pool information. Integrated dump_hostgroup_debug() into test_throttle, test_warnings, and test_unsupported_queries for enhanced diagnostics.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

This PR addresses multiple TAP/CI failures surfaced during CI migration testing by adjusting flaky timing assertions, fixing test plans/cleanup, correcting cluster node environment/setup for single-container clusters, and adding targeted diagnostics to aid future debugging.

Changes:

Fix/relax several TAP tests (plan count correction, timing tolerance for soft TTL race, non-superuser PgSQL variable, and safe cleanup on early-skip paths).
Correct isolated cluster startup/env behavior (ensure primary is first in TAP_CLUSTER_NODES, align primary admin vars with nodes, and restore GCOV env for coverage).
Add dump_hostgroup_debug() test helper and invoke it in a few tests to print hostgroup/pool state when failures occur.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/tap/tests/test_warnings-t.cpp	Adds hostgroup debug dump at test start to aid “hostgroup unreachable” diagnosis.
test/tap/tests/test_unsupported_queries-t.cpp	Adds hostgroup debug dump and attempts to auto-detect the test user’s default hostgroup.
test/tap/tests/test_tls_stats-t.cpp	Corrects TAP `plan()` count to match executed assertions.
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp	Dumps hostgroup state when table setup fails, improving failure diagnostics.
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp	Adds tolerance to assertions to accommodate a known soft-TTL refresh race.
test/tap/tests/test_auth_methods-t.cpp	Stops internal noise threads on early skip to avoid `std::thread` destructor termination.
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp	Replaces superuser-only variable with a user-settable alternative.
test/tap/tap/utils.h	Declares `dump_hostgroup_debug()` helper for TAP tests.
test/tap/tap/utils.cpp	Implements `dump_hostgroup_debug()` querying runtime servers and pool stats.
test/tap/groups/groups.json	Reassigns MySQL 8.0+ dependent tests out of legacy groups.
test/infra/control/start-proxysql-isolated.bash	Restores GCOV env for primary after unsetting for background nodes; aligns primary admin vars with nodes.
test/infra/control/env-isolated.bash	Ensures primary (6032) is first in `TAP_CLUSTER_NODES` for cluster tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T17:02:32Z

+		if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
+			dump_hostgroup_debug(dbg_admin, target_hg);


dbg_admin is only closed on successful mysql_real_connect(). If the connect fails, the handle returned by mysql_init() is leaked. Close dbg_admin whenever it is non-null (even after a failed connect) to keep the test clean and avoid accumulating connections/handles in long CI runs.

Suggested change

if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

dump_hostgroup_debug(dbg_admin, target_hg);

if (dbg_admin) {

if (mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

dump_hostgroup_debug(dbg_admin, target_hg);

}

Copilot · 2026-03-23T17:02:33Z

+		if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
+			// Get default hostgroup for the test user
+			std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";
+			int target_hg = 0;
+			if (mysql_query(dbg_admin, hg_query.c_str()) == 0) {
+				MYSQL_RES *res = mysql_store_result(dbg_admin);
+				if (res) {
+					MYSQL_ROW row = mysql_fetch_row(res);
+					if (row) target_hg = atoi(row[0]);
+					mysql_free_result(res);
+				}
+			}
+			dump_hostgroup_debug(dbg_admin, target_hg);


dbg_admin is only closed when mysql_real_connect() succeeds. If the connect fails, the handle created by mysql_init() is leaked. Ensure mysql_close(dbg_admin) is called whenever dbg_admin is non-null, regardless of connection success.

Suggested change

if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

// Get default hostgroup for the test user

std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";

int target_hg = 0;

if (mysql_query(dbg_admin, hg_query.c_str()) == 0) {

MYSQL_RES *res = mysql_store_result(dbg_admin);

if (res) {

MYSQL_ROW row = mysql_fetch_row(res);

if (row) target_hg = atoi(row[0]);

mysql_free_result(res);

}

}

dump_hostgroup_debug(dbg_admin, target_hg);

if (dbg_admin) {

if (mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

// Get default hostgroup for the test user

std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";

int target_hg = 0;

if (mysql_query(dbg_admin, hg_query.c_str()) == 0) {

MYSQL_RES *res = mysql_store_result(dbg_admin);

if (res) {

MYSQL_ROW row = mysql_fetch_row(res);

if (row) target_hg = atoi(row[0]);

mysql_free_result(res);

}

}

dump_hostgroup_debug(dbg_admin, target_hg);

}

Copilot · 2026-03-23T17:02:33Z

+		MYSQL* dbg_admin = mysql_init(NULL);
+		if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
+			// Get default hostgroup for the test user
+			std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";


The query that looks up default_hostgroup concatenates cl.username directly into SQL without escaping. If the username contains quotes (or other special chars), this breaks the query and can make the diagnostic dump misleading. Use mysql_real_escape_string() (or an existing helper) to escape the username before building the SQL string.

Suggested change

std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";

std::string escaped_username;

if (cl.username) {

size_t uname_len = std::strlen(cl.username);

// mysql_real_escape_string can at most double the size of the string, plus one byte for terminator

escaped_username.resize(uname_len * 2 + 1);

unsigned long escaped_len = mysql_real_escape_string(dbg_admin, &escaped_username[0], cl.username, uname_len);

escaped_username.resize(escaped_len);

} else {

escaped_username = "";

}

std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + escaped_username + "' LIMIT 1";

TAP test binaries are compiled with gcov but their .gcda files are written to the source tree (next to the binaries), not to /gcov. The coverage collection only scanned /gcov, missing all TAP test coverage data. Now copies both .gcno and .gcda files from the workspace into /gcov before running fastcov, so TAP test coverage is included in reports.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

test/tap/tap/utils.cpp (1)
467-478: Consider wrapping MYSQL_RES* with a smart pointer and custom deleter for better resource management.

This new helper manually manages resultsets with mysql_free_result(). Per coding guidelines for **/*.cpp, RAII should be used for resource management. A smart pointer with a custom deleter wrapping mysql_free_result() would make this safer and align with project standards. This applies to both the runtime_mysql_servers query (lines 467-478) and stats_mysql_connection_pool query (lines 489-500).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tap/utils.cpp` around lines 467 - 478, Replace manual MYSQL_RES*
management in the runtime_mysql_servers (and likewise in
stats_mysql_connection_pool) block: instead of calling mysql_free_result(res)
explicitly after mysql_store_result(admin), wrap the returned MYSQL_RES* in a
RAII smart pointer with a custom deleter that invokes mysql_free_result (e.g., a
std::unique_ptr<MYSQL_RES, decltype(&mysql_free_result)> or a small
typedef/alias), then use that smart pointer (e.g., res_ptr.get()) wherever
mysql_fetch_row and mysql_num_rows are called so the result is automatically
freed when the pointer goes out of scope; remove the explicit
mysql_free_result(res) call.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/tap/tap/utils.cpp`:
- Around line 467-479: In dump_hostgroup_debug, add diagnostics for when
mysql_store_result(admin) returns NULL: after the MYSQL_RES* res =
mysql_store_result(admin) call, detect if res == NULL and call diag to log a
clear message including mysql_error(admin) (and optionally mysql_errno(admin))
so callers can distinguish a failed result retrieval from an empty result set;
keep the existing branch that handles non-NULL results (including the
mysql_num_rows(res) == 0 check) unchanged.

In `@test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp`:
- Line 281: The test adds "enable_mergejoin" but ProxySQL doesn't track it;
either remove that line from the test or add support for the variable by adding
an entry for enable_mergejoin to the pgsql_variable_name enum (in
proxysql_structs.h), include "enable_mergejoin" in the pgsql_tracked_variables[]
table and the pgsql_other_variables set so ProxySQL will intercept SETs for it;
update all three places (enum, pgsql_tracked_variables[], pgsql_other_variables)
to ensure consistent tracking if you choose to add rather than delete the test
line.

---

Nitpick comments:
In `@test/tap/tap/utils.cpp`:
- Around line 467-478: Replace manual MYSQL_RES* management in the
runtime_mysql_servers (and likewise in stats_mysql_connection_pool) block:
instead of calling mysql_free_result(res) explicitly after
mysql_store_result(admin), wrap the returned MYSQL_RES* in a RAII smart pointer
with a custom deleter that invokes mysql_free_result (e.g., a
std::unique_ptr<MYSQL_RES, decltype(&mysql_free_result)> or a small
typedef/alias), then use that smart pointer (e.g., res_ptr.get()) wherever
mysql_fetch_row and mysql_num_rows are called so the result is automatically
freed when the pointer goes out of scope; remove the explicit
mysql_free_result(res) call.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 24bfee16-492c-4244-91f2-26cab3eb27cb

📥 Commits

Reviewing files that changed from the base of the PR and between d1c86b1 and 0f12f5d.

📒 Files selected for processing (12)

test/infra/control/env-isolated.bash
test/infra/control/start-proxysql-isolated.bash
test/tap/groups/groups.json
test/tap/tap/utils.cpp
test/tap/tap/utils.h
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: CI-builds / builds (debian12,-dbg)
GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
GitHub Check: CI-builds / builds (ubuntu22,-tap)
GitHub Check: Agent
GitHub Check: run / trigger
GitHub Check: claude-review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{cpp,h,hpp}: Class names must use PascalCase with protocol prefixes (MySQL_, PgSQL_, ProxySQL_)
Member variables must use snake_case
Constants and macros must use UPPER_SNAKE_CASE
C++17 is required; use conditional compilation via #ifdef PROXYSQLGENAI, #ifdef PROXYSQL31, etc. for feature flags
Use pthread mutexes for synchronization and std::atomic<> for counters

Files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp
test/tap/tap/utils.h
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
test/tap/tap/utils.cpp

**/*.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

Use RAII for resource management and jemalloc for memory allocation

Files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
test/tap/tap/utils.cpp

test/tap/tests/{test_*,*-t}.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

Test files in test/tap/tests/ must follow the naming pattern test_*.cpp or *-t.cpp

Files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp

🧠 Learnings (8)

📓 Common learnings

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

📚 Learning: 2026-03-22T14:38:16.082Z

Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness

Applied to files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/tap/tap/utils.h
test/tap/groups/groups.json
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
test/infra/control/env-isolated.bash
test/tap/tap/utils.cpp

📚 Learning: 2026-01-20T09:34:19.124Z

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.

Applied to files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp

📚 Learning: 2026-01-20T07:40:34.938Z

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

Applied to files:

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp
test/tap/tests/test_tls_stats-t.cpp
test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp
test/infra/control/start-proxysql-isolated.bash
test/tap/tap/utils.cpp

📚 Learning: 2026-02-13T09:29:39.713Z

Learnt from: wazir-ahmed
Repo: sysown/proxysql PR: 5372
File: test/tap/tap/mcp_client.cpp:355-385
Timestamp: 2026-02-13T09:29:39.713Z
Learning: In ProxySQL MCP implementation (test/tap/tap/mcp_client.cpp), the `check_server()` method uses the ping endpoint which is designed to work without authentication. The `ping` method at the `config` endpoint should not require the `Authorization: Bearer` header, unlike tool invocation endpoints which do require authentication when `auth_token_` is set.

Applied to files:

test/tap/tests/test_auth_methods-t.cpp

📚 Learning: 2026-03-22T14:38:16.082Z

Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to **/*.{cpp,h,hpp} : C++17 is required; use conditional compilation via `#ifdef` PROXYSQLGENAI, `#ifdef` PROXYSQL31, etc. for feature flags

Applied to files:

test/tap/tests/test_auth_methods-t.cpp
test/tap/tests/test_unsupported_queries-t.cpp
test/tap/tests/test_warnings-t.cpp

📚 Learning: 2026-01-20T09:34:27.165Z

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

test/infra/control/start-proxysql-isolated.bash
test/tap/groups/groups.json
test/infra/control/env-isolated.bash

📚 Learning: 2026-03-22T14:38:16.082Z

Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Feature tiers are controlled by build flags: PROXYSQL31=1 for v3.1.x, PROXYSQLGENAI=1 for v4.0.x; PROXYSQLGENAI=1 implies PROXYSQL31=1

Applied to files:

test/infra/control/start-proxysql-isolated.bash

🔇 Additional comments (12)

test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp (2)

182-194: LGTM! Tolerance logic correctly addresses soft-TTL race condition.

The range check [expected_num_slow_clients, expected_num_slow_clients + tolerance] properly handles the case where multiple threads may observe the cache entry as needing refresh before any single thread completes the backend query. The diagnostic message clearly shows the expected range and actual value.

198-213: LGTM! Cache and hostgroup tolerance correctly maintains invariant.

The tolerance relationship is sound: one additional hostgroup hit (backend query) corresponds to one fewer cache hit, preserving the total query count. The computed deltas (stats_after - stats_before) properly isolate this test run's contributions.

test/tap/tests/test_tls_stats-t.cpp (1)

123-125: LGTM! Plan count correction is accurate.

Verified the test count: 27 ok() calls on the happy path matches the corrected plan(27). The comment updates for Parts 3-4 (tests 23-27) are consistent with the code.

Minor observation: The earlier comments (lines 118-122) have some numbering inconsistencies with actual test counts (e.g., // 1-5: but there are 6 TLS variables), but since those weren't part of this change, it's fine to address separately if desired.

test/tap/groups/groups.json (1)

185-185: Good test-group compatibility alignment.

These updates correctly scope MySQL 8.0+-dependent tests away from legacy groups while keeping the intended variant groups.

Also applies to: 220-220

test/infra/control/env-isolated.bash (1)

62-77: Primary-first cluster ordering is correctly implemented.

This change makes TAP_CLUSTER_NODES deterministic and consistent with cluster tests that expect the primary endpoint first.

test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp (1)

48-50: Helpful failure-path diagnostics added.

Dumping hostgroup state here is a good CI-debug improvement and keeps behavior isolated to the error path.

test/tap/tests/test_auth_methods-t.cpp (1)

1144-1145: Early-return thread cleanup is correctly fixed.

Calling stop_internal_noise_threads() before returning on the skip path is the right fix to avoid thread-destructor crashes.

test/tap/tests/test_unsupported_queries-t.cpp (1)

539-557: Diagnostic preflight block is a good addition.

This improves observability for CI failures without altering the existing unsupported-query test flow.

test/tap/tests/test_warnings-t.cpp (1)

554-561: Nice observability improvement for warning-path failures.

The hostgroup dump hook is scoped well and should help diagnose intermittent CI issues.

test/infra/control/start-proxysql-isolated.bash (2)

57-60: GCOV environment preservation is correctly handled.

Saving before unsetting and restoring for the primary process is the right approach for reliable coverage collection.

Also applies to: 103-106

182-188: Primary admin-variable alignment looks correct.

Setting admin-restapi_enabled and admin-debug on primary to match nodes should help cluster sync converge consistently.

test/tap/tap/utils.h (1)

159-165: Utility API addition is clean and well-documented.

The declaration and comments are clear and fit the new diagnostics usage across TAP tests.

coderabbitai · 2026-03-23T17:07:17Z

+		MYSQL_RES* res = mysql_store_result(admin);
+		if (res) {
+			MYSQL_ROW row;
+			diag("runtime_mysql_servers (hg=%d):", hostgroup);
+			while ((row = mysql_fetch_row(res))) {
+				diag("  hg=%s host=%s port=%s status=%s max_conn=%s comment=%s",
+					row[0], row[1], row[2], row[3], row[4], row[5] ? row[5] : "");
+			}
+			if (mysql_num_rows(res) == 0) {
+				diag("  (no servers in hostgroup %d)", hostgroup);
+			}
+			mysql_free_result(res);
+		}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the two mysql_store_result() branches in dump_hostgroup_debug # and whether explicit NULL-result diagnostics are present. rg -n -C4 'dump_hostgroup_debug|mysql_store_result|Failed to store result' test/tap/tap/utils.cpp

Repository: sysown/proxysql

Length of output: 6720

Add explicit diagnostics when mysql_store_result() fails.

The dump_hostgroup_debug function silently ignores NULL returns from mysql_store_result(), making it difficult to distinguish between "query succeeded with empty result" and "failed to store result." Add diagnostics for both NULL cases using mysql_error(admin).

Suggested patch

if (mysql_query(admin, query) == 0) { MYSQL_RES* res = mysql_store_result(admin); if (res) { MYSQL_ROW row; diag("runtime_mysql_servers (hg=%d):", hostgroup); while ((row = mysql_fetch_row(res))) { diag(" hg=%s host=%s port=%s status=%s max_conn=%s comment=%s", row[0], row[1], row[2], row[3], row[4], row[5] ? row[5] : ""); } if (mysql_num_rows(res) == 0) { diag(" (no servers in hostgroup %d)", hostgroup); } mysql_free_result(res); + } else { + diag(" Failed to store result for runtime_mysql_servers: %s", mysql_error(admin)); } }

if (mysql_query(admin, query) == 0) { MYSQL_RES* res = mysql_store_result(admin); if (res) { MYSQL_ROW row; diag("stats_mysql_connection_pool (hg=%d):", hostgroup); while ((row = mysql_fetch_row(res))) { diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s Queries=%s", row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[9]); } if (mysql_num_rows(res) == 0) { diag(" (no pool entries for hostgroup %d)", hostgroup); } mysql_free_result(res); + } else { + diag(" Failed to store result for stats_mysql_connection_pool: %s", mysql_error(admin)); } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/tap/tap/utils.cpp` around lines 467 - 479, In dump_hostgroup_debug, add diagnostics for when mysql_store_result(admin) returns NULL: after the MYSQL_RES* res = mysql_store_result(admin) call, detect if res == NULL and call diag to log a clear message including mysql_error(admin) (and optionally mysql_errno(admin)) so callers can distinguish a failed result retrieval from an empty result set; keep the existing branch that handles non-NULL results (including the mysql_num_rows(res) == 0 check) unchanged.

coderabbitai · 2026-03-23T17:07:17Z

        {"timezone", {"timezone", {"UTC", "PST8PDT"}}},
        {"bytea_output", {"bytea_output", {"hex", "escape"}}},
-        {"allow_in_place_tablespaces", {"allow_in_place_tablespaces", {"on", "off"}}},
+        {"enable_mergejoin", {"enable_mergejoin", {"on", "off"}}},


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify enable_mergejoin is NOT in tracked variables echo "=== Checking pgsql_tracked_variables array ===" rg -n "enable_mergejoin" include/proxysql_structs.h echo "" echo "=== Checking pgsql_other_variables set ===" rg -n "enable_mergejoin" lib/PgSQL_Session.cpp echo "" echo "=== Checking pgsql_variable_name enum ===" rg -n "ENABLE_MERGEJOIN" include/proxysql_structs.h echo "" echo "=== For comparison, showing tracked enable_* variables ===" rg -n "enable_" include/proxysql_structs.h | head -20

Repository: sysown/proxysql

Length of output: 1515

enable_mergejoin is not in ProxySQL's tracked variables, making this test ineffective.

The variable replacement removes tracking coverage. enable_mergejoin is absent from pgsql_tracked_variables[], pgsql_other_variables, and the pgsql_variable_name enum, while other enable_* variables (enable_bitmapscan, enable_hashjoin, enable_indexscan, enable_nestloop, enable_seqscan, enable_sort) are tracked. ProxySQL will not intercept SET statements for this variable, causing the test to rely on PostgreSQL's native behavior rather than validate ProxySQL's session variable tracking mechanism.

Either add enable_mergejoin to the tracking infrastructure (enum in proxysql_structs.h, entry in pgsql_tracked_variables[], and pgsql_other_variables set) or remove this line since the other enable_* variables already provide test coverage.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp` at line 281, The test adds "enable_mergejoin" but ProxySQL doesn't track it; either remove that line from the test or add support for the variable by adding an entry for enable_mergejoin to the pgsql_variable_name enum (in proxysql_structs.h), include "enable_mergejoin" in the pgsql_tracked_variables[] table and the pgsql_other_variables set so ProxySQL will intercept SETs for it; update all three places (enum, pgsql_tracked_variables[], pgsql_other_variables) to ensure consistent tracking if you choose to add rather than delete the test line.

gemini-code-assist

Code Review

This pull request introduces a new dump_hostgroup_debug utility function to aid in debugging CI failures by providing diagnostic information about hostgroups and connection pools. This function is integrated into several existing tests. The PR also refines the cluster node configuration, ensures proper handling of GCOV environment variables for the primary process, and adjusts test assertions for query cache soft TTL to include a tolerance for race conditions. A review comment highlights that the dump_hostgroup_debug function's output for stats_mysql_connection_pool is missing the MaxConnUsed field, which should be included for complete debugging information.

gemini-code-assist · 2026-03-23T17:07:33Z

+				diag("  hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s Queries=%s",
+					row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[9]);


The diagnostic output for stats_mysql_connection_pool is missing the MaxConnUsed field, even though it's selected in the query on line 486. This can lead to incomplete debugging information. Please update the diag call to include this field.

diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s MaxConnUsed=%s Queries=%s", row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9]);

coderabbitai

🧹 Nitpick comments (1)

test/infra/control/run-tests-isolated.bash (1)
272-282: LGTM! Coverage collection now correctly captures TAP test data.

The extension to copy both .gcno and .gcda files is correct—fastcov requires both to generate coverage reports. The comments clearly explain the dual-source pattern: ProxySQL writes .gcda via GCOV_PREFIX, while TAP tests write to the source tree.

Minor optional improvement: using read -r would prevent backslash interpretation in filenames, though unlikely to matter for auto-generated coverage paths.

,
🔧 Optional: more robust read loop
-                        cd \"\${WORKSPACE}\" && find . -path './ci_infra_logs' -prune -o \$ -name '*.gcno' -o -name '*.gcda' \$ -type f -print | while read gcfile; do
+                        cd \"\${WORKSPACE}\" && find . -path './ci_infra_logs' -prune -o \$ -name '*.gcno' -o -name '*.gcda' \$ -type f -print | while IFS= read -r gcfile; do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-tests-isolated.bash` around lines 272 - 282, The
while-loop that iterates over discovered coverage files in
run-tests-isolated.bash uses a plain read which can interpret backslashes in
filenames; update the loop to use read -r when reading gcfile to prevent
backslash interpretation (locate the find ... | while read gcfile; do loop and
change it to use read -r gcfile), ensuring the rest of the logic (target,
target_dir, mkdir -p, cp -f steps) remains unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/infra/control/run-tests-isolated.bash`:
- Around line 272-282: The while-loop that iterates over discovered coverage
files in run-tests-isolated.bash uses a plain read which can interpret
backslashes in filenames; update the loop to use read -r when reading gcfile to
prevent backslash interpretation (locate the find ... | while read gcfile; do
loop and change it to use read -r gcfile), ensuring the rest of the logic
(target, target_dir, mkdir -p, cp -f steps) remains unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a32d0f7e-404f-4374-96c0-1afb8cddb451

📥 Commits

Reviewing files that changed from the base of the PR and between 0f12f5d and fb9ff44.

📒 Files selected for processing (1)

test/infra/control/run-tests-isolated.bash

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
GitHub Check: CI-builds / builds (ubuntu22,-tap)
GitHub Check: CI-builds / builds (debian12,-dbg)
GitHub Check: run / trigger
GitHub Check: claude-review

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness

The default value (1) syncs both configured AND runtime mysql_servers, including changes made by the monitor (read_only checks moving servers between writer/reader hostgroups). This caused circular sync: a cluster node's monitor modified servers, creating a newer epoch, and the primary synced this back — overwriting its correct config and saving the corrupted version to disk. With auto_select (3), the algorithm chooses mysql_servers_v2 when the monitor is enabled, which only syncs user-configured changes and ignores monitor-triggered runtime modifications. Evidence: node 6042's monitor removed MariaDB servers from hostgroup 1701 ("Setting status OFFLINE HARD"), creating checksum 0x7C0A67D5 with epoch 1774288555. Primary (epoch 1774288549) synced from node 6042 and saved to disk, losing hostgroup 1300 permanently. Fixes #5540.

The test spawns its own ProxySQL replica (port 16062) inside the test-runner container, but connected to it via cl.host (TAP_ADMINHOST) which resolves to the main ProxySQL container in the Docker setup. Since port 16062 doesn't exist on the main container, the connection hung indefinitely, causing the test to stall. Now uses R_HOST="127.0.0.1" for all replica connections (R_PORT), since the replica runs locally inside the same container as the test.

…ca config The replica used the default algorithm (1) which syncs runtime mysql_servers — including monitor-shunned status. The test checks the config table (mysql_servers) for status='ONLINE', but the runtime sync overwrites the config table with SHUNNED status. With auto_select (3) and monitor enabled (-M flag), the replica uses mysql_servers_v2 sync which preserves the configured status.

The debug filter for Query_Processor.cpp/process_query suppressed the "Searching thread-local 'rules_fast_routing' hashmap" messages that test_query_rules_fast_routing_algorithm-t relies on to verify fast routing behavior (expects 20 log entries, gets 0).

When a cluster node syncs proxysql_servers from the primary and runs LOAD PROXYSQL SERVERS TO RUNTIME, it rebuilds the node list from the synced data. If the primary isn't in its own proxysql_servers table, the synced list won't include it. The node then destroys the cluster monitor thread for the primary and never recreates it. Evidence: node 1 log shows "Destroyed Cluster Node Entry for host proxysql:6032" followed by "closing thread for peer proxysql:6032" after syncing proxysql_servers from the primary. After that, the node never detected any further checksum changes from the primary, causing test_cluster1-t and test_unshun_algorithm-t to fail. Fixes #5542.

The coverage collection copied ALL .gcno and .gcda files from the workspace to /gcov using cp -f. But ProxySQL's .gcda files (with real execution data from the Docker container) were already in /gcov via GCOV_PREFIX. The cp -f overwrote them with the host's stale .gcda files (zero or outdated execution counts). This caused lib/MySQLFFTO.cpp and lib/PgSQLFFTO.cpp (and likely other ProxySQL library files) to show 0% coverage despite the FFTO tests running successfully and the .gcda files having real data. Fix: copy .gcno files with cp -f (always needed for fastcov), but copy .gcda files with cp -n (no-clobber) so ProxySQL's existing execution data in /gcov is preserved. TAP test .gcda files that don't exist in /gcov yet are still copied.

Coverage data comes from two sources that may overlap: /gcov — ProxySQL daemon execution (via GCOV_PREFIX) WORKSPACE — TAP test execution (.gcda next to binaries) Both may cover the same code (e.g. libproxysql.a exercised by both ProxySQL runtime and TAP test harness). Previously we copied .gcda files between directories, which either overwrote (cp -f) or kept only the first version (cp -n) instead of combining counters. Now generates two separate .info files (proxysql.info + tap_tests.info) and combines them with fastcov -C, which correctly sums execution counters for overlapping source files.

- Remove redundant INSERT of primary into proxysql_servers on nodes. Commit b02147b added the primary to the shared PROXYSQL_SERVERS_SQL but didn't remove the pre-existing per-node INSERT, causing a UNIQUE constraint violation that broke all multi-node CI groups. - Add cluster_mysql_servers_sync_algorithm=3 to test_cluster_sync config files and update hardcoded datadir paths to match current workspace.

TAP test binaries link libcpp_dotenv.so which transitively needs libparser.so. The RUNPATH baked into libcpp_dotenv.so contains stale paths from the original build, so the dynamic linker can't find it. - Add LD_LIBRARY_PATH for the SKIP_PROXYSQL (unit test) host path - Add TAP_DEPS_PATH to env-isolated.bash so proxysql-tester.py constructs the correct LD_LIBRARY_PATH inside Docker containers

Three problems caused machine stalls and 0% coverage in multi-group runs: 1. All test-runner containers wrote TAP test .gcda files to the shared ${WORKSPACE}, causing concurrent corruption and memory bloat when multiple gcov processes fought over the same files. 2. Each container independently ran fastcov -j$(nproc) in its EXIT trap, spawning hundreds of concurrent gcov processes across all containers. 3. GCOV_PREFIX_STRIP=3 placed .gcda at proxysql/lib/obj/ but .gcno was copied to lib/obj/ — path mismatch meant gcov couldn't correlate them. Additionally, the --include filter expected full workspace paths but gcov records bare filenames, so all results were filtered out. Changes: - Set GCOV_PREFIX=/gcov/tap on test-runner containers so TAP .gcda files go to per-INFRA_ID /gcov/tap/ instead of shared workspace. - Pass MULTI_GROUP=1 from run-multi-group.bash to test-runners. When set, the EXIT trap copies .gcno adjacent to .gcda (handling path offset from GCOV_PREFIX_STRIP) but skips fastcov/genhtml. - Add centralized coverage collection in run-multi-group.bash: processes each group's /gcov directory sequentially with fastcov -j4 after all groups finish, then combines the .info files. - Remove --include filter from fastcov (source paths are bare filenames, not full paths, so the filter rejected everything). - Standalone mode (no MULTI_GROUP) is unchanged.

fastcov was including system headers (/usr/include/c++/...) and vendored dependencies (deps/json/json.hpp) in coverage output. Use --exclude instead of --include since project source files have bare filenames in gcov output (no path prefix to match against), while system/deps paths are always absolute.

- mysql84 pre-proxysql.bash: use docker exec for all ProxySQL admin commands (was using mysql -h127.0.0.1 which can't reach the container) - basictests setup-infras.bash: use INSERT OR REPLACE when remapping hostgroups 1300/1301 -> 0/1 to avoid UNIQUE constraint violation when target hostgroups already have entries - proxysql-tester.py: skip per-stage coverage generation when MULTI_GROUP=1 (was spawning 32 gcov processes on 35k .gcno files) - Move test_cluster_sync-t and test_read_only_actions_offline_hard_servers-t to legacy-g5 pending investigation of cluster sync algorithm issues. Move reg_test_3765_ssl_pollout-t from legacy-g5 to legacy-g4.

sonarqubecloud · 2026-03-24T23:29:52Z

Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

renecannao added 7 commits March 23, 2026 14:15

Fix coverage: restore GCOV_PREFIX for primary after unsetting for clu…

82c112c

…ster nodes

Move test_auth_methods-t from legacy-g2 to mysql84-g2 only

7739b86

Test requires MySQL 8.0+ (detects 5.7 and skips, but crashes with 'terminate called without an active exception' during noise thread cleanup). The crash-on-skip is a separate test bug.

Copilot AI review requested due to automatic review settings March 23, 2026 16:59

Copilot started reviewing on behalf of renecannao March 23, 2026 16:59 View session

Copilot AI reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

renecannao added 12 commits March 23, 2026 20:42

renecannao merged commit 2350632 into v3.0 Mar 24, 2026
3 checks passed

		if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
		dump_hostgroup_debug(dbg_admin, target_hg);

-			std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1";
+			std::string escaped_username;
+			if (cl.username) {
+				size_t uname_len = std::strlen(cl.username);
+				// mysql_real_escape_string can at most double the size of the string, plus one byte for terminator
+				escaped_username.resize(uname_len * 2 + 1);
+				unsigned long escaped_len = mysql_real_escape_string(dbg_admin, &escaped_username[0], cl.username, uname_len);
+				escaped_username.resize(escaped_len);
+			} else {
+				escaped_username = "";
+			}
+			std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + escaped_username + "' LIMIT 1";

		diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s Queries=%s",
		row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[9]);

Conversation

renecannao commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test fixes

Test group reassignments

Cluster sync fixes

Coverage infrastructure overhaul

libparser.so loading fix

Diagnostics

Fixes

Test plan

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 24, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

renecannao commented Mar 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading