CI test fixes: cluster sync, plan counts, test group assignments, and diagnostics#5538
CI test fixes: cluster sync, plan counts, test group assignments, and diagnostics#5538renecannao merged 20 commits intov3.0from
Conversation
- test_tls_stats-t: plan(29) -> plan(27). The test executes 27 ok() calls on the happy path; plan was miscounted during authoring. - reg_test_4935-caching_sha2-t: remove from legacy-g4 (MySQL 5.7 doesn't support caching_sha2_password). Keep in mysql84-g4.
- Add dump_hostgroup_debug() utility: dumps runtime_mysql_servers and stats_mysql_connection_pool for a given hostgroup via diag() output. Helps diagnose "hostgroup has no servers available" CI failures. - test_throttle_max_bytes_per_second_to_client-t: dump hostgroup 1300, 0, and 1 when create_table_test_sbtest1() fails. - test_warnings-t: dump target hostgroup state before running tests. - test_unsupported_queries-t: dump target hostgroup state before running tests. - test_query_cache_soft_ttl_pct-t: add tolerance of +1 to all three assertions. The soft TTL refresh race allows two threads to both see "needs refresh" simultaneously under load, producing one extra backend hit. This is a known property of the soft TTL mechanism.
Test requires MySQL 8.0+ (detects 5.7 and skips, but crashes with 'terminate called without an active exception' during noise thread cleanup). The crash-on-skip is a separate test bug.
- pgsql-transaction_variable_state_tracking-t: replace superuser-only variable allow_in_place_tablespaces with user-settable enable_mergejoin. Fixes #5536. - test_auth_methods-t: call stop_internal_noise_threads() before early return on MySQL 5.7, preventing 'terminate called without an active exception' crash from joinable std::thread destructors. Fixes #5537.
…nodes The primary had admin-debug=false and admin-restapi_enabled=false (defaults), while cluster nodes had them set to true. Since these variables are NOT in the CLUSTER_SYNC_INTERFACES_ADMIN exclusion list, they're included in the checksum. The checksum mismatch between primary and nodes caused admin_variables sync to never converge on satellite nodes. Now the primary sets the same admin-restapi_enabled=true and admin-debug=true as all nodes, ensuring matching checksums. Fixes #5535.
test_cluster1-t expects conns[0] to be the primary ProxySQL instance. TAP_CLUSTER_NODES only listed nodes 1-9 (ports 6042-6122), so the test loop operated on nodes 1-4 instead of primary + nodes 1-3. Node 4 (port 6072) is NOT in proxysql_servers (only core nodes 1-3 are registered), so changes on node 4 were invisible to the cluster — sync could never detect them, causing the 30-second timeout. Now TAP_CLUSTER_NODES starts with proxysql:6032 (primary), matching the old cluster_ports layout where 6032 was the first entry. Fixes #5535.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughEnsure ProxySQL primary endpoint is first in constructed cluster node lists when nodes exist; snapshot/restore GCOV env around proxysql startup; add a hostgroup debug dump helper and call it from several tests; relax some race assertions; adjust TAP plans and test group JSON; broaden coverage file collection. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses multiple CI test failures and significantly enhances diagnostic capabilities within the test suite. It includes critical fixes for incorrect test plan counts, introduces necessary tolerance for race conditions in cache timing tests, and resolves issues with test group assignments to ensure compatibility with MySQL versions. Furthermore, the changes correct cluster synchronization logic by properly including and configuring the primary node, and restore accurate code coverage reporting for the primary process. A new debugging utility has been added and integrated into key tests to streamline troubleshooting of hostgroup-related failures. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Pull request overview
This PR addresses multiple TAP/CI failures surfaced during CI migration testing by adjusting flaky timing assertions, fixing test plans/cleanup, correcting cluster node environment/setup for single-container clusters, and adding targeted diagnostics to aid future debugging.
Changes:
- Fix/relax several TAP tests (plan count correction, timing tolerance for soft TTL race, non-superuser PgSQL variable, and safe cleanup on early-skip paths).
- Correct isolated cluster startup/env behavior (ensure primary is first in
TAP_CLUSTER_NODES, align primary admin vars with nodes, and restore GCOV env for coverage). - Add
dump_hostgroup_debug()test helper and invoke it in a few tests to print hostgroup/pool state when failures occur.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/tap/tests/test_warnings-t.cpp | Adds hostgroup debug dump at test start to aid “hostgroup unreachable” diagnosis. |
| test/tap/tests/test_unsupported_queries-t.cpp | Adds hostgroup debug dump and attempts to auto-detect the test user’s default hostgroup. |
| test/tap/tests/test_tls_stats-t.cpp | Corrects TAP plan() count to match executed assertions. |
| test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp | Dumps hostgroup state when table setup fails, improving failure diagnostics. |
| test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp | Adds tolerance to assertions to accommodate a known soft-TTL refresh race. |
| test/tap/tests/test_auth_methods-t.cpp | Stops internal noise threads on early skip to avoid std::thread destructor termination. |
| test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp | Replaces superuser-only variable with a user-settable alternative. |
| test/tap/tap/utils.h | Declares dump_hostgroup_debug() helper for TAP tests. |
| test/tap/tap/utils.cpp | Implements dump_hostgroup_debug() querying runtime servers and pool stats. |
| test/tap/groups/groups.json | Reassigns MySQL 8.0+ dependent tests out of legacy groups. |
| test/infra/control/start-proxysql-isolated.bash | Restores GCOV env for primary after unsetting for background nodes; aligns primary admin vars with nodes. |
| test/infra/control/env-isolated.bash | Ensures primary (6032) is first in TAP_CLUSTER_NODES for cluster tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | ||
| dump_hostgroup_debug(dbg_admin, target_hg); |
There was a problem hiding this comment.
dbg_admin is only closed on successful mysql_real_connect(). If the connect fails, the handle returned by mysql_init() is leaked. Close dbg_admin whenever it is non-null (even after a failed connect) to keep the test clean and avoid accumulating connections/handles in long CI runs.
| if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | |
| dump_hostgroup_debug(dbg_admin, target_hg); | |
| if (dbg_admin) { | |
| if (mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | |
| dump_hostgroup_debug(dbg_admin, target_hg); | |
| } |
| if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | ||
| // Get default hostgroup for the test user | ||
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1"; | ||
| int target_hg = 0; | ||
| if (mysql_query(dbg_admin, hg_query.c_str()) == 0) { | ||
| MYSQL_RES *res = mysql_store_result(dbg_admin); | ||
| if (res) { | ||
| MYSQL_ROW row = mysql_fetch_row(res); | ||
| if (row) target_hg = atoi(row[0]); | ||
| mysql_free_result(res); | ||
| } | ||
| } | ||
| dump_hostgroup_debug(dbg_admin, target_hg); |
There was a problem hiding this comment.
dbg_admin is only closed when mysql_real_connect() succeeds. If the connect fails, the handle created by mysql_init() is leaked. Ensure mysql_close(dbg_admin) is called whenever dbg_admin is non-null, regardless of connection success.
| if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | |
| // Get default hostgroup for the test user | |
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1"; | |
| int target_hg = 0; | |
| if (mysql_query(dbg_admin, hg_query.c_str()) == 0) { | |
| MYSQL_RES *res = mysql_store_result(dbg_admin); | |
| if (res) { | |
| MYSQL_ROW row = mysql_fetch_row(res); | |
| if (row) target_hg = atoi(row[0]); | |
| mysql_free_result(res); | |
| } | |
| } | |
| dump_hostgroup_debug(dbg_admin, target_hg); | |
| if (dbg_admin) { | |
| if (mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | |
| // Get default hostgroup for the test user | |
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1"; | |
| int target_hg = 0; | |
| if (mysql_query(dbg_admin, hg_query.c_str()) == 0) { | |
| MYSQL_RES *res = mysql_store_result(dbg_admin); | |
| if (res) { | |
| MYSQL_ROW row = mysql_fetch_row(res); | |
| if (row) target_hg = atoi(row[0]); | |
| mysql_free_result(res); | |
| } | |
| } | |
| dump_hostgroup_debug(dbg_admin, target_hg); | |
| } |
| MYSQL* dbg_admin = mysql_init(NULL); | ||
| if (dbg_admin && mysql_real_connect(dbg_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) { | ||
| // Get default hostgroup for the test user | ||
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1"; |
There was a problem hiding this comment.
The query that looks up default_hostgroup concatenates cl.username directly into SQL without escaping. If the username contains quotes (or other special chars), this breaks the query and can make the diagnostic dump misleading. Use mysql_real_escape_string() (or an existing helper) to escape the username before building the SQL string.
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + std::string(cl.username) + "' LIMIT 1"; | |
| std::string escaped_username; | |
| if (cl.username) { | |
| size_t uname_len = std::strlen(cl.username); | |
| // mysql_real_escape_string can at most double the size of the string, plus one byte for terminator | |
| escaped_username.resize(uname_len * 2 + 1); | |
| unsigned long escaped_len = mysql_real_escape_string(dbg_admin, &escaped_username[0], cl.username, uname_len); | |
| escaped_username.resize(escaped_len); | |
| } else { | |
| escaped_username = ""; | |
| } | |
| std::string hg_query = "SELECT default_hostgroup FROM mysql_users WHERE username='" + escaped_username + "' LIMIT 1"; |
TAP test binaries are compiled with gcov but their .gcda files are written to the source tree (next to the binaries), not to /gcov. The coverage collection only scanned /gcov, missing all TAP test coverage data. Now copies both .gcno and .gcda files from the workspace into /gcov before running fastcov, so TAP test coverage is included in reports.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/tap/tap/utils.cpp (1)
467-478: Consider wrappingMYSQL_RES*with a smart pointer and custom deleter for better resource management.This new helper manually manages resultsets with
mysql_free_result(). Per coding guidelines for**/*.cpp, RAII should be used for resource management. A smart pointer with a custom deleter wrappingmysql_free_result()would make this safer and align with project standards. This applies to both theruntime_mysql_serversquery (lines 467-478) andstats_mysql_connection_poolquery (lines 489-500).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/tap/tap/utils.cpp` around lines 467 - 478, Replace manual MYSQL_RES* management in the runtime_mysql_servers (and likewise in stats_mysql_connection_pool) block: instead of calling mysql_free_result(res) explicitly after mysql_store_result(admin), wrap the returned MYSQL_RES* in a RAII smart pointer with a custom deleter that invokes mysql_free_result (e.g., a std::unique_ptr<MYSQL_RES, decltype(&mysql_free_result)> or a small typedef/alias), then use that smart pointer (e.g., res_ptr.get()) wherever mysql_fetch_row and mysql_num_rows are called so the result is automatically freed when the pointer goes out of scope; remove the explicit mysql_free_result(res) call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/tap/tap/utils.cpp`:
- Around line 467-479: In dump_hostgroup_debug, add diagnostics for when
mysql_store_result(admin) returns NULL: after the MYSQL_RES* res =
mysql_store_result(admin) call, detect if res == NULL and call diag to log a
clear message including mysql_error(admin) (and optionally mysql_errno(admin))
so callers can distinguish a failed result retrieval from an empty result set;
keep the existing branch that handles non-NULL results (including the
mysql_num_rows(res) == 0 check) unchanged.
In `@test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp`:
- Line 281: The test adds "enable_mergejoin" but ProxySQL doesn't track it;
either remove that line from the test or add support for the variable by adding
an entry for enable_mergejoin to the pgsql_variable_name enum (in
proxysql_structs.h), include "enable_mergejoin" in the pgsql_tracked_variables[]
table and the pgsql_other_variables set so ProxySQL will intercept SETs for it;
update all three places (enum, pgsql_tracked_variables[], pgsql_other_variables)
to ensure consistent tracking if you choose to add rather than delete the test
line.
---
Nitpick comments:
In `@test/tap/tap/utils.cpp`:
- Around line 467-478: Replace manual MYSQL_RES* management in the
runtime_mysql_servers (and likewise in stats_mysql_connection_pool) block:
instead of calling mysql_free_result(res) explicitly after
mysql_store_result(admin), wrap the returned MYSQL_RES* in a RAII smart pointer
with a custom deleter that invokes mysql_free_result (e.g., a
std::unique_ptr<MYSQL_RES, decltype(&mysql_free_result)> or a small
typedef/alias), then use that smart pointer (e.g., res_ptr.get()) wherever
mysql_fetch_row and mysql_num_rows are called so the result is automatically
freed when the pointer goes out of scope; remove the explicit
mysql_free_result(res) call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 24bfee16-492c-4244-91f2-26cab3eb27cb
📒 Files selected for processing (12)
test/infra/control/env-isolated.bashtest/infra/control/start-proxysql-isolated.bashtest/tap/groups/groups.jsontest/tap/tap/utils.cpptest/tap/tap/utils.htest/tap/tests/pgsql-transaction_variable_state_tracking-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_query_cache_soft_ttl_pct-t.cpptest/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: CI-builds / builds (debian12,-dbg)
- GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
- GitHub Check: CI-builds / builds (ubuntu22,-tap)
- GitHub Check: Agent
- GitHub Check: run / trigger
- GitHub Check: claude-review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.{cpp,h,hpp}: Class names must use PascalCase with protocol prefixes (MySQL_, PgSQL_, ProxySQL_)
Member variables must use snake_case
Constants and macros must use UPPER_SNAKE_CASE
C++17 is required; use conditional compilation via#ifdefPROXYSQLGENAI,#ifdefPROXYSQL31, etc. for feature flags
Use pthread mutexes for synchronization and std::atomic<> for counters
Files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/tap/tests/pgsql-transaction_variable_state_tracking-t.cpptest/tap/tap/utils.htest/tap/tests/test_query_cache_soft_ttl_pct-t.cpptest/tap/tap/utils.cpp
**/*.cpp
📄 CodeRabbit inference engine (CLAUDE.md)
Use RAII for resource management and jemalloc for memory allocation
Files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/tap/tests/pgsql-transaction_variable_state_tracking-t.cpptest/tap/tests/test_query_cache_soft_ttl_pct-t.cpptest/tap/tap/utils.cpp
test/tap/tests/{test_*,*-t}.cpp
📄 CodeRabbit inference engine (CLAUDE.md)
Test files in test/tap/tests/ must follow the naming pattern test_*.cpp or *-t.cpp
Files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/tap/tests/pgsql-transaction_variable_state_tracking-t.cpptest/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
🧠 Learnings (8)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness
Applied to files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/tap/tap/utils.htest/tap/groups/groups.jsontest/tap/tests/test_query_cache_soft_ttl_pct-t.cpptest/infra/control/env-isolated.bashtest/tap/tap/utils.cpp
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.
Applied to files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/tap/tests/pgsql-transaction_variable_state_tracking-t.cpptest/tap/tests/test_query_cache_soft_ttl_pct-t.cpp
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).
Applied to files:
test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpptest/tap/tests/test_tls_stats-t.cpptest/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpptest/infra/control/start-proxysql-isolated.bashtest/tap/tap/utils.cpp
📚 Learning: 2026-02-13T09:29:39.713Z
Learnt from: wazir-ahmed
Repo: sysown/proxysql PR: 5372
File: test/tap/tap/mcp_client.cpp:355-385
Timestamp: 2026-02-13T09:29:39.713Z
Learning: In ProxySQL MCP implementation (test/tap/tap/mcp_client.cpp), the `check_server()` method uses the ping endpoint which is designed to work without authentication. The `ping` method at the `config` endpoint should not require the `Authorization: Bearer` header, unlike tool invocation endpoints which do require authentication when `auth_token_` is set.
Applied to files:
test/tap/tests/test_auth_methods-t.cpp
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to **/*.{cpp,h,hpp} : C++17 is required; use conditional compilation via `#ifdef` PROXYSQLGENAI, `#ifdef` PROXYSQL31, etc. for feature flags
Applied to files:
test/tap/tests/test_auth_methods-t.cpptest/tap/tests/test_unsupported_queries-t.cpptest/tap/tests/test_warnings-t.cpp
📚 Learning: 2026-01-20T09:34:27.165Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
Applied to files:
test/infra/control/start-proxysql-isolated.bashtest/tap/groups/groups.jsontest/infra/control/env-isolated.bash
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Feature tiers are controlled by build flags: PROXYSQL31=1 for v3.1.x, PROXYSQLGENAI=1 for v4.0.x; PROXYSQLGENAI=1 implies PROXYSQL31=1
Applied to files:
test/infra/control/start-proxysql-isolated.bash
🔇 Additional comments (12)
test/tap/tests/test_query_cache_soft_ttl_pct-t.cpp (2)
182-194: LGTM! Tolerance logic correctly addresses soft-TTL race condition.The range check
[expected_num_slow_clients, expected_num_slow_clients + tolerance]properly handles the case where multiple threads may observe the cache entry as needing refresh before any single thread completes the backend query. The diagnostic message clearly shows the expected range and actual value.
198-213: LGTM! Cache and hostgroup tolerance correctly maintains invariant.The tolerance relationship is sound: one additional hostgroup hit (backend query) corresponds to one fewer cache hit, preserving the total query count. The computed deltas (
stats_after - stats_before) properly isolate this test run's contributions.test/tap/tests/test_tls_stats-t.cpp (1)
123-125: LGTM! Plan count correction is accurate.Verified the test count: 27
ok()calls on the happy path matches the correctedplan(27). The comment updates for Parts 3-4 (tests 23-27) are consistent with the code.Minor observation: The earlier comments (lines 118-122) have some numbering inconsistencies with actual test counts (e.g.,
// 1-5:but there are 6 TLS variables), but since those weren't part of this change, it's fine to address separately if desired.test/tap/groups/groups.json (1)
185-185: Good test-group compatibility alignment.These updates correctly scope MySQL 8.0+-dependent tests away from legacy groups while keeping the intended variant groups.
Also applies to: 220-220
test/infra/control/env-isolated.bash (1)
62-77: Primary-first cluster ordering is correctly implemented.This change makes
TAP_CLUSTER_NODESdeterministic and consistent with cluster tests that expect the primary endpoint first.test/tap/tests/test_throttle_max_bytes_per_second_to_client-t.cpp (1)
48-50: Helpful failure-path diagnostics added.Dumping hostgroup state here is a good CI-debug improvement and keeps behavior isolated to the error path.
test/tap/tests/test_auth_methods-t.cpp (1)
1144-1145: Early-return thread cleanup is correctly fixed.Calling
stop_internal_noise_threads()before returning on the skip path is the right fix to avoid thread-destructor crashes.test/tap/tests/test_unsupported_queries-t.cpp (1)
539-557: Diagnostic preflight block is a good addition.This improves observability for CI failures without altering the existing unsupported-query test flow.
test/tap/tests/test_warnings-t.cpp (1)
554-561: Nice observability improvement for warning-path failures.The hostgroup dump hook is scoped well and should help diagnose intermittent CI issues.
test/infra/control/start-proxysql-isolated.bash (2)
57-60: GCOV environment preservation is correctly handled.Saving before unsetting and restoring for the primary process is the right approach for reliable coverage collection.
Also applies to: 103-106
182-188: Primary admin-variable alignment looks correct.Setting
admin-restapi_enabledandadmin-debugon primary to match nodes should help cluster sync converge consistently.test/tap/tap/utils.h (1)
159-165: Utility API addition is clean and well-documented.The declaration and comments are clear and fit the new diagnostics usage across TAP tests.
| MYSQL_RES* res = mysql_store_result(admin); | ||
| if (res) { | ||
| MYSQL_ROW row; | ||
| diag("runtime_mysql_servers (hg=%d):", hostgroup); | ||
| while ((row = mysql_fetch_row(res))) { | ||
| diag(" hg=%s host=%s port=%s status=%s max_conn=%s comment=%s", | ||
| row[0], row[1], row[2], row[3], row[4], row[5] ? row[5] : ""); | ||
| } | ||
| if (mysql_num_rows(res) == 0) { | ||
| diag(" (no servers in hostgroup %d)", hostgroup); | ||
| } | ||
| mysql_free_result(res); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the two mysql_store_result() branches in dump_hostgroup_debug
# and whether explicit NULL-result diagnostics are present.
rg -n -C4 'dump_hostgroup_debug|mysql_store_result|Failed to store result' test/tap/tap/utils.cppRepository: sysown/proxysql
Length of output: 6720
Add explicit diagnostics when mysql_store_result() fails.
The dump_hostgroup_debug function silently ignores NULL returns from mysql_store_result(), making it difficult to distinguish between "query succeeded with empty result" and "failed to store result." Add diagnostics for both NULL cases using mysql_error(admin).
Suggested patch
if (mysql_query(admin, query) == 0) {
MYSQL_RES* res = mysql_store_result(admin);
if (res) {
MYSQL_ROW row;
diag("runtime_mysql_servers (hg=%d):", hostgroup);
while ((row = mysql_fetch_row(res))) {
diag(" hg=%s host=%s port=%s status=%s max_conn=%s comment=%s",
row[0], row[1], row[2], row[3], row[4], row[5] ? row[5] : "");
}
if (mysql_num_rows(res) == 0) {
diag(" (no servers in hostgroup %d)", hostgroup);
}
mysql_free_result(res);
+ } else {
+ diag(" Failed to store result for runtime_mysql_servers: %s", mysql_error(admin));
}
} if (mysql_query(admin, query) == 0) {
MYSQL_RES* res = mysql_store_result(admin);
if (res) {
MYSQL_ROW row;
diag("stats_mysql_connection_pool (hg=%d):", hostgroup);
while ((row = mysql_fetch_row(res))) {
diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s Queries=%s",
row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[9]);
}
if (mysql_num_rows(res) == 0) {
diag(" (no pool entries for hostgroup %d)", hostgroup);
}
mysql_free_result(res);
+ } else {
+ diag(" Failed to store result for stats_mysql_connection_pool: %s", mysql_error(admin));
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/tap/tap/utils.cpp` around lines 467 - 479, In dump_hostgroup_debug, add
diagnostics for when mysql_store_result(admin) returns NULL: after the
MYSQL_RES* res = mysql_store_result(admin) call, detect if res == NULL and call
diag to log a clear message including mysql_error(admin) (and optionally
mysql_errno(admin)) so callers can distinguish a failed result retrieval from an
empty result set; keep the existing branch that handles non-NULL results
(including the mysql_num_rows(res) == 0 check) unchanged.
| {"timezone", {"timezone", {"UTC", "PST8PDT"}}}, | ||
| {"bytea_output", {"bytea_output", {"hex", "escape"}}}, | ||
| {"allow_in_place_tablespaces", {"allow_in_place_tablespaces", {"on", "off"}}}, | ||
| {"enable_mergejoin", {"enable_mergejoin", {"on", "off"}}}, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify enable_mergejoin is NOT in tracked variables
echo "=== Checking pgsql_tracked_variables array ==="
rg -n "enable_mergejoin" include/proxysql_structs.h
echo ""
echo "=== Checking pgsql_other_variables set ==="
rg -n "enable_mergejoin" lib/PgSQL_Session.cpp
echo ""
echo "=== Checking pgsql_variable_name enum ==="
rg -n "ENABLE_MERGEJOIN" include/proxysql_structs.h
echo ""
echo "=== For comparison, showing tracked enable_* variables ==="
rg -n "enable_" include/proxysql_structs.h | head -20Repository: sysown/proxysql
Length of output: 1515
enable_mergejoin is not in ProxySQL's tracked variables, making this test ineffective.
The variable replacement removes tracking coverage. enable_mergejoin is absent from pgsql_tracked_variables[], pgsql_other_variables, and the pgsql_variable_name enum, while other enable_* variables (enable_bitmapscan, enable_hashjoin, enable_indexscan, enable_nestloop, enable_seqscan, enable_sort) are tracked. ProxySQL will not intercept SET statements for this variable, causing the test to rely on PostgreSQL's native behavior rather than validate ProxySQL's session variable tracking mechanism.
Either add enable_mergejoin to the tracking infrastructure (enum in proxysql_structs.h, entry in pgsql_tracked_variables[], and pgsql_other_variables set) or remove this line since the other enable_* variables already provide test coverage.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/tap/tests/pgsql-transaction_variable_state_tracking-t.cpp` at line 281,
The test adds "enable_mergejoin" but ProxySQL doesn't track it; either remove
that line from the test or add support for the variable by adding an entry for
enable_mergejoin to the pgsql_variable_name enum (in proxysql_structs.h),
include "enable_mergejoin" in the pgsql_tracked_variables[] table and the
pgsql_other_variables set so ProxySQL will intercept SETs for it; update all
three places (enum, pgsql_tracked_variables[], pgsql_other_variables) to ensure
consistent tracking if you choose to add rather than delete the test line.
There was a problem hiding this comment.
Code Review
This pull request introduces a new dump_hostgroup_debug utility function to aid in debugging CI failures by providing diagnostic information about hostgroups and connection pools. This function is integrated into several existing tests. The PR also refines the cluster node configuration, ensures proper handling of GCOV environment variables for the primary process, and adjusts test assertions for query cache soft TTL to include a tolerance for race conditions. A review comment highlights that the dump_hostgroup_debug function's output for stats_mysql_connection_pool is missing the MaxConnUsed field, which should be included for complete debugging information.
| diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s Queries=%s", | ||
| row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[9]); |
There was a problem hiding this comment.
The diagnostic output for stats_mysql_connection_pool is missing the MaxConnUsed field, even though it's selected in the query on line 486. This can lead to incomplete debugging information. Please update the diag call to include this field.
diag(" hg=%s host=%s port=%s status=%s ConnUsed=%s ConnFree=%s ConnOK=%s ConnERR=%s MaxConnUsed=%s Queries=%s",
row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7], row[8], row[9]);There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/infra/control/run-tests-isolated.bash (1)
272-282: LGTM! Coverage collection now correctly captures TAP test data.The extension to copy both
.gcnoand.gcdafiles is correct—fastcov requires both to generate coverage reports. The comments clearly explain the dual-source pattern: ProxySQL writes.gcdaviaGCOV_PREFIX, while TAP tests write to the source tree.Minor optional improvement: using
read -rwould prevent backslash interpretation in filenames, though unlikely to matter for auto-generated coverage paths.,
🔧 Optional: more robust read loop
- cd \"\${WORKSPACE}\" && find . -path './ci_infra_logs' -prune -o \\( -name '*.gcno' -o -name '*.gcda' \\) -type f -print | while read gcfile; do + cd \"\${WORKSPACE}\" && find . -path './ci_infra_logs' -prune -o \\( -name '*.gcno' -o -name '*.gcda' \\) -type f -print | while IFS= read -r gcfile; do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/infra/control/run-tests-isolated.bash` around lines 272 - 282, The while-loop that iterates over discovered coverage files in run-tests-isolated.bash uses a plain read which can interpret backslashes in filenames; update the loop to use read -r when reading gcfile to prevent backslash interpretation (locate the find ... | while read gcfile; do loop and change it to use read -r gcfile), ensuring the rest of the logic (target, target_dir, mkdir -p, cp -f steps) remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/infra/control/run-tests-isolated.bash`:
- Around line 272-282: The while-loop that iterates over discovered coverage
files in run-tests-isolated.bash uses a plain read which can interpret
backslashes in filenames; update the loop to use read -r when reading gcfile to
prevent backslash interpretation (locate the find ... | while read gcfile; do
loop and change it to use read -r gcfile), ensuring the rest of the logic
(target, target_dir, mkdir -p, cp -f steps) remains unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a32d0f7e-404f-4374-96c0-1afb8cddb451
📒 Files selected for processing (1)
test/infra/control/run-tests-isolated.bash
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
- GitHub Check: CI-builds / builds (ubuntu22,-tap)
- GitHub Check: CI-builds / builds (debian12,-dbg)
- GitHub Check: run / trigger
- GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness
The default value (1) syncs both configured AND runtime mysql_servers,
including changes made by the monitor (read_only checks moving servers
between writer/reader hostgroups). This caused circular sync: a
cluster node's monitor modified servers, creating a newer epoch, and
the primary synced this back — overwriting its correct config and
saving the corrupted version to disk.
With auto_select (3), the algorithm chooses mysql_servers_v2 when
the monitor is enabled, which only syncs user-configured changes
and ignores monitor-triggered runtime modifications.
Evidence: node 6042's monitor removed MariaDB servers from hostgroup
1701 ("Setting status OFFLINE HARD"), creating checksum 0x7C0A67D5
with epoch 1774288555. Primary (epoch 1774288549) synced from node
6042 and saved to disk, losing hostgroup 1300 permanently.
Fixes #5540.
The test spawns its own ProxySQL replica (port 16062) inside the test-runner container, but connected to it via cl.host (TAP_ADMINHOST) which resolves to the main ProxySQL container in the Docker setup. Since port 16062 doesn't exist on the main container, the connection hung indefinitely, causing the test to stall. Now uses R_HOST="127.0.0.1" for all replica connections (R_PORT), since the replica runs locally inside the same container as the test.
…ca config The replica used the default algorithm (1) which syncs runtime mysql_servers — including monitor-shunned status. The test checks the config table (mysql_servers) for status='ONLINE', but the runtime sync overwrites the config table with SHUNNED status. With auto_select (3) and monitor enabled (-M flag), the replica uses mysql_servers_v2 sync which preserves the configured status.
The debug filter for Query_Processor.cpp/process_query suppressed the "Searching thread-local 'rules_fast_routing' hashmap" messages that test_query_rules_fast_routing_algorithm-t relies on to verify fast routing behavior (expects 20 log entries, gets 0).
When a cluster node syncs proxysql_servers from the primary and runs LOAD PROXYSQL SERVERS TO RUNTIME, it rebuilds the node list from the synced data. If the primary isn't in its own proxysql_servers table, the synced list won't include it. The node then destroys the cluster monitor thread for the primary and never recreates it. Evidence: node 1 log shows "Destroyed Cluster Node Entry for host proxysql:6032" followed by "closing thread for peer proxysql:6032" after syncing proxysql_servers from the primary. After that, the node never detected any further checksum changes from the primary, causing test_cluster1-t and test_unshun_algorithm-t to fail. Fixes #5542.
The coverage collection copied ALL .gcno and .gcda files from the workspace to /gcov using cp -f. But ProxySQL's .gcda files (with real execution data from the Docker container) were already in /gcov via GCOV_PREFIX. The cp -f overwrote them with the host's stale .gcda files (zero or outdated execution counts). This caused lib/MySQLFFTO.cpp and lib/PgSQLFFTO.cpp (and likely other ProxySQL library files) to show 0% coverage despite the FFTO tests running successfully and the .gcda files having real data. Fix: copy .gcno files with cp -f (always needed for fastcov), but copy .gcda files with cp -n (no-clobber) so ProxySQL's existing execution data in /gcov is preserved. TAP test .gcda files that don't exist in /gcov yet are still copied.
Coverage data comes from two sources that may overlap: /gcov — ProxySQL daemon execution (via GCOV_PREFIX) WORKSPACE — TAP test execution (.gcda next to binaries) Both may cover the same code (e.g. libproxysql.a exercised by both ProxySQL runtime and TAP test harness). Previously we copied .gcda files between directories, which either overwrote (cp -f) or kept only the first version (cp -n) instead of combining counters. Now generates two separate .info files (proxysql.info + tap_tests.info) and combines them with fastcov -C, which correctly sums execution counters for overlapping source files.
- Remove redundant INSERT of primary into proxysql_servers on nodes. Commit b02147b added the primary to the shared PROXYSQL_SERVERS_SQL but didn't remove the pre-existing per-node INSERT, causing a UNIQUE constraint violation that broke all multi-node CI groups. - Add cluster_mysql_servers_sync_algorithm=3 to test_cluster_sync config files and update hardcoded datadir paths to match current workspace.
TAP test binaries link libcpp_dotenv.so which transitively needs libparser.so. The RUNPATH baked into libcpp_dotenv.so contains stale paths from the original build, so the dynamic linker can't find it. - Add LD_LIBRARY_PATH for the SKIP_PROXYSQL (unit test) host path - Add TAP_DEPS_PATH to env-isolated.bash so proxysql-tester.py constructs the correct LD_LIBRARY_PATH inside Docker containers
Three problems caused machine stalls and 0% coverage in multi-group runs:
1. All test-runner containers wrote TAP test .gcda files to the shared
${WORKSPACE}, causing concurrent corruption and memory bloat when
multiple gcov processes fought over the same files.
2. Each container independently ran fastcov -j$(nproc) in its EXIT trap,
spawning hundreds of concurrent gcov processes across all containers.
3. GCOV_PREFIX_STRIP=3 placed .gcda at proxysql/lib/obj/ but .gcno was
copied to lib/obj/ — path mismatch meant gcov couldn't correlate them.
Additionally, the --include filter expected full workspace paths but
gcov records bare filenames, so all results were filtered out.
Changes:
- Set GCOV_PREFIX=/gcov/tap on test-runner containers so TAP .gcda files
go to per-INFRA_ID /gcov/tap/ instead of shared workspace.
- Pass MULTI_GROUP=1 from run-multi-group.bash to test-runners. When set,
the EXIT trap copies .gcno adjacent to .gcda (handling path offset from
GCOV_PREFIX_STRIP) but skips fastcov/genhtml.
- Add centralized coverage collection in run-multi-group.bash: processes
each group's /gcov directory sequentially with fastcov -j4 after all
groups finish, then combines the .info files.
- Remove --include filter from fastcov (source paths are bare filenames,
not full paths, so the filter rejected everything).
- Standalone mode (no MULTI_GROUP) is unchanged.
fastcov was including system headers (/usr/include/c++/...) and vendored dependencies (deps/json/json.hpp) in coverage output. Use --exclude instead of --include since project source files have bare filenames in gcov output (no path prefix to match against), while system/deps paths are always absolute.
- mysql84 pre-proxysql.bash: use docker exec for all ProxySQL admin commands (was using mysql -h127.0.0.1 which can't reach the container) - basictests setup-infras.bash: use INSERT OR REPLACE when remapping hostgroups 1300/1301 -> 0/1 to avoid UNIQUE constraint violation when target hostgroups already have entries - proxysql-tester.py: skip per-stage coverage generation when MULTI_GROUP=1 (was spawning 32 gcov processes on 35k .gcno files) - Move test_cluster_sync-t and test_read_only_actions_offline_hard_servers-t to legacy-g5 pending investigation of cluster sync algorithm issues. Move reg_test_3765_ssl_pollout-t from legacy-g5 to legacy-g4.
|



Summary
Fixes multiple TAP test failures and CI infrastructure issues discovered during CI migration testing.
Test fixes
plan(29)→plan(27)— the test executes 27ok()calls on the happy path, plan was miscountedallow_in_place_tablespaceswith user-settableenable_mergejoin(fixes pgsql-transaction_variable_state_tracking-t: permission denied on superuser-only variables #5536)stop_internal_noise_threads()before early return on MySQL 5.7, preventing crash from joinablestd::threaddestructors (fixes test_auth_methods-t: crashes with 'terminate called without an active exception' on MySQL 5.7 #5537)127.0.0.1for locally-spawned replica instead ofcl.host(Docker hostname of main container). Setcluster_mysql_servers_sync_algorithm=3on replica so it uses v2 sync (config-only, ignores monitor-shunned status)Test group reassignments
legacy-g4—caching_sha2_passwordrequires MySQL 8.0+legacy-g2— requires MySQL 8.0+Cluster sync fixes
test_cluster1-texpectsconns[0]to be the primary. (fixes test_cluster1-t: admin_variables fails to sync on satellite nodes (single-container cluster) #5535)admin-restapi_enabledandadmin-debugon the primary to match node configurationPROXYSQL_SERVERS_SQLbut didn't remove the pre-existing per-node INSERT, causing a UNIQUE constraint violation on all multi-node CI groupsCoverage infrastructure overhaul
Fixed machine stalls (load >1000, 5GB+ resident per gcov process) and 0% coverage reports in multi-group CI runs. Root causes:
Shared workspace corruption: All test-runner containers wrote TAP test
.gcdafiles to the same${WORKSPACE}paths. Concurrent gcov processes on the same.gcdafiles caused memory bloat and data corruption.GCOV_PREFIX=/gcov/tapon test-runner containers so TAP.gcdafiles go to per-INFRA_ID/gcov/tap/directories.Per-container fastcov storm: Each container ran
fastcov -j$(nproc)in its EXIT trap. With 5+ containers finishing simultaneously, this spawned 300+ gcov processes.MULTI_GROUP=1fromrun-multi-group.bashto test-runners. When set, the EXIT trap copies.gcnoadjacent to.gcda(prep work) but skips fastcov/genhtml. Coverage collection runs once centrally after all groups finish, sequentially per group with-j4.Path mismatch → 0% coverage:
GCOV_PREFIX_STRIP=3placed.gcdaatproxysql/lib/obj/but.gcnowas copied tolib/obj/— gcov couldn't correlate them. Additionally, the--includefilter expected full workspace paths but gcov records bare filenames..gcnocopy now strips leading path components to find the match. Replaced--includewith--exclude /usr depssince project source paths are bare filenames.libparser.so loading fix
LD_LIBRARY_PATHfor unit tests (SKIP_PROXYSQL path) andTAP_DEPS_PATHfor Docker-based tests —libcpp_dotenv.sotransitively needslibparser.sobut has stale RUNPATH from the original buildDiagnostics
dump_hostgroup_debug()utility function — dumpsruntime_mysql_serversandstats_mysql_connection_poolfor a given hostgroup. Added totest_throttle,test_warnings, andtest_unsupported_queries.Fixes
Test plan
run-multi-group.bashwith all groups — no machine stalls during coverage collectionrun-tests-isolated.bashstill generates coverage