Skip to content

Run ProxySQL cluster inside single container + CI stability fixes#5528

Merged
renecannao merged 15 commits intov3.0-ci260322from
v3.0-ci260322_cluster
Mar 23, 2026
Merged

Run ProxySQL cluster inside single container + CI stability fixes#5528
renecannao merged 15 commits intov3.0-ci260322from
v3.0-ci260322_cluster

Conversation

@renecannao
Copy link
Copy Markdown
Contributor

@renecannao renecannao commented Mar 23, 2026

Summary

Reduces Docker container count and improves CI test stability by running all ProxySQL cluster nodes as background processes inside a single container, plus several related fixes.

Single-container cluster (main change)

Instead of spawning 1+9 Docker containers per group for the ProxySQL cluster, all instances now run as background processes inside the primary container. Each node gets its own port pair:

  • Primary: admin=6032, mysql=6033
  • proxy-node1: admin=6042, mysql=6043
  • proxy-node2: admin=6052, mysql=6053
  • ...through proxy-node9

This eliminates 9 Docker containers per group, reducing Docker networking/iptables contention that caused "no route to host" (errno 113) errors under parallel test execution.

Other changes

  • Coverage log redirection: fastcov/genhtml output now goes to coverage-generation.log instead of cluttering stdout (both run-tests-isolated.bash and run-multi-group.bash)
  • Coverage source filter: Only include proxysql source dirs (include/, lib/, src/, test/) instead of system headers and vendored deps
  • Clickhouse group split: Moved 3 clickhouse TAP tests to new legacy-clickhouse-g1 group, removed infra-clickhouse23 from legacy infras (fewer containers per legacy group)
  • Auto-cleanup: AUTO_CLEANUP defaults to 1 — successful groups clean up containers automatically
  • No-op hooks: legacy/pre-proxysql.bash and default/pre-proxysql.bash are now no-ops (cluster handled by start-proxysql-isolated.bash)

Test plan

  • Run run-multi-group.bash with legacy groups and verify cluster starts inside single container
  • Verify coverage reports generate correctly with source-only filtering
  • Verify clickhouse tests work in new legacy-clickhouse-g1 group

Summary by CodeRabbit

  • New Features

    • Optional multi-node ProxySQL cluster startup with configurable node count and a new legacy-clickhouse test group; ability to skip cluster startup.
  • Improvements

    • Dedicated coverage-generation log with clearer failure hints.
    • Stronger per-node readiness checks, explicit timeouts, concurrent node polling, and a final "ProxySQL is UP." status.
    • Default auto-cleanup enabled for test groups; some startup hooks become no-ops when skipping cluster start.
  • Tests

    • Expanded diagnostic and connection logging in traffic and regression tests.

renecannao and others added 8 commits March 23, 2026 08:26
Instead of spawning 1+9 Docker containers for the ProxySQL cluster,
run all instances as background processes inside the primary container.
Each node gets its own port pair (admin=6032+i*10, mysql=6033+i*10)
and data directory.

This eliminates 9 Docker containers per group, avoiding Docker
networking/iptables contention that caused "no route to host" errors
under parallel test execution.

Changes:
- start-proxysql-isolated.bash: starts cluster nodes as background
  processes inside the container, then initializes the cluster and
  installs the scheduler — all in one script
- check_all_nodes.bash: polls 127.0.0.1 on different ports instead
  of container hostnames
- env-isolated.bash: TAP_CLUSTER_NODES uses proxysql:PORT format
- default/pre-proxysql.bash, legacy/pre-proxysql.bash: no longer
  call cluster_start.bash/cluster_init.bash (handled by start-proxysql)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fastcov and genhtml produce verbose output that clutters the group
test logs. Now redirected to coverage-generation.log inside the
coverage report directory. Status messages still go to stdout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace exclude-based filtering (-e /usr/include, deps/) with
include-based filtering (-i) for include/, lib/, src/, test/.
This avoids capturing coverage for system headers, vendored
dependencies, and other non-proxysql code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fastcov, lcov, and genhtml produce verbose output that clutters the
multi-group summary. Now redirected to coverage-generation.log inside
the combined coverage report directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move 3 clickhouse TAP tests (clickhouse_php_conn-t,
test_clickhouse_server-t, test_clickhouse_server_libmysql-t) from
legacy-g1/g3 to legacy-clickhouse-g1. Remove infra-clickhouse23
from legacy/infras.lst.

This eliminates the clickhouse container from all legacy-g1..g4
runs, reducing container count per group.
Copilot AI review requested due to automatic review settings March 23, 2026 11:26
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Moves ProxySQL cluster startup into container entrypoint, makes per-node admin ports dynamic (proxysql:PORT where PORT=6032+10*i), adds multi-node readiness, cluster init and scheduler injection, centralizes coverage-generation logging, adds legacy-clickhouse group, and simplifies/neutralizes pre-proxysql hooks.

Changes

Cohort / File(s) Summary
ProxySQL container & cluster orchestration
test/infra/control/start-proxysql-isolated.bash, test/infra/control/check_all_nodes.bash, test/infra/control/env-isolated.bash
Entrypoint now launches primary proxysql in foreground and NUM_NODES background proxy-node instances (default 9) with per-node data dirs and ports; builds TAP_CLUSTER_NODES as proxysql:PORT (PORT=6032+10*i); adds per-node readiness polling, cluster init SQL (servers/vars/scheduler), installs check_all_nodes.bash; check_all_nodes.bash iterates explicit PORTS, builds ALL_TABLES including runtime_ variants, and runs SELECT COUNT(*) per table against each port; removed set -e/pipefail from one script.
Coverage logging & auto-cleanup
test/infra/control/run-multi-group.bash, test/infra/control/run-tests-isolated.bash
Introduce coverage-generation.log under coverage dir; redirect/append fastcov/lcov/genhtml output into that log and update warnings to reference it. Change default AUTO_CLEANUP from 0 to 1.
Test hooks / pre-proxysql simplification
test/tap/groups/default/pre-proxysql.bash, test/tap/groups/legacy/pre-proxysql.bash
Removed explicit cluster_start/cluster_init calls; default hook now conditionally skips startup when SKIP_CLUSTER_START or NUM_NODES=0 and otherwise waits briefly; legacy hook simplified to immediate no-op.
Legacy-ClickHouse test group addition
test/tap/groups/legacy-clickhouse/env.sh, test/tap/groups/legacy-clickhouse/infras.lst, test/tap/groups/legacy-clickhouse/pre-proxysql.bash, test/tap/groups/legacy/infras.lst
Add legacy-clickhouse group: export REGULAR_INFRA_DATADIR=/var/lib/proxysql, add infra-clickhouse23 to group infras, add no-op pre-proxysql hook; remove infra-clickhouse23 from legacy infras list.
Debug filters update
test/infra/infra-mysql57/conf/proxysql/infra-config.sql
Rename one debug handler and add multiple new debug_filters entries for mysql_connection, MySQL_Monitor, PgSQL_Monitor, and PgSQL_Data_Stream.
Noise/logging improvements
test/tap/tap/noise_utils.cpp
Enhance connection/setup logging for MySQL/Postgres traffic helpers to include host/port/user/(dbname) and detailed error messages; no control-flow changes.
Test output wording
test/tap/tests/reg_test__ssl_client_busy_wait-t.cpp
Expanded diagnostic header and numbered-step diag() messages; no logic or assertions changed.

Sequence Diagram(s)

sequenceDiagram
    participant Entrypoint as Container Entrypoint
    participant Primary as Primary ProxySQL\nproxysql:6032
    participant Nodes as ProxySQL Nodes\nproxysql:6032+10*i
    participant Admin as Admin Client

    Entrypoint->>Primary: start primary (foreground, exec)
    Entrypoint->>Nodes: start NUM_NODES background proxy-node1..N (distinct dirs & ports)
    Entrypoint->>Primary: poll admin port 6032 until ready
    Entrypoint->>Nodes: poll each node admin port (6032 + 10*i) until ready
    Admin->>Primary: configure proxysql_servers (core nodes), cluster vars, load/save
    Admin->>Primary: add scheduler entries and install check script
    loop per node
        Admin->>Nodes: enable REST API, insert primary into proxysql_servers, add scheduler entries
    end
    Admin->>Primary: run per-table SELECT COUNT(*) checks via check_all_nodes against each port
    Entrypoint->>Admin: log ">>> ProxySQL is UP."
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐇 I hopped through ports and nudged each node,
Spawned tiny proxysql homes on the road,
Logs tucked in a file to tell what they do,
Legacy branches rest while tests run anew,
The rabbit twitches—ready for the go!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Run ProxySQL cluster inside single container + CI stability fixes' accurately summarizes the main changes: consolidating cluster nodes into a single container and addressing CI stability issues through coverage logging and test group reorganization.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v3.0-ci260322_cluster

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on optimizing the ProxySQL cluster setup within the CI environment. By reducing the number of Docker containers and improving coverage reporting, it aims to enhance the stability and efficiency of the testing process. Several configuration adjustments and script modifications support these changes.

Highlights

  • Single-Container Cluster: The PR consolidates the ProxySQL cluster nodes into a single Docker container, reducing resource consumption and network contention.
  • Coverage Improvements: Coverage log redirection and source filtering enhance the clarity and accuracy of coverage reports.
  • Test Stability: ClickHouse tests are isolated into a new group, and auto-cleanup is enabled by default to improve CI stability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the TAP CI infrastructure to reduce Docker container count and improve parallel-run stability by running the ProxySQL cluster nodes as background processes inside a single ProxySQL container, alongside several coverage/logging and group-splitting adjustments.

Changes:

  • Start ProxySQL “cluster nodes” as background processes inside the primary ProxySQL container, with per-node port allocation and updated cluster env wiring.
  • Split ClickHouse-related TAP tests into a dedicated legacy-clickhouse-g1 group and adjust legacy infra usage accordingly.
  • Improve coverage generation signal-to-noise by redirecting tool output to a log file and narrowing fastcov source inclusion.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/infra/control/start-proxysql-isolated.bash Main change: start multiple ProxySQL instances inside one container; adds wait/init logic for cluster nodes.
test/infra/control/env-isolated.bash Updates TAP cluster node endpoints to point at proxysql:<per-node-admin-port> instead of per-container hostnames.
test/infra/control/check_all_nodes.bash Updates scheduler keepalive script for single-container / localhost-per-port cluster layout.
test/tap/groups/default/pre-proxysql.bash Stops doing cluster startup; now conditionally waits for cluster stabilization.
test/tap/groups/legacy/pre-proxysql.bash Converts legacy group pre-hook to a no-op.
test/tap/groups/legacy/infras.lst Removes ClickHouse infra from legacy group infra list.
test/tap/groups/legacy-clickhouse/pre-proxysql.bash Adds a new group hook (no-op) for ClickHouse legacy group.
test/tap/groups/legacy-clickhouse/infras.lst Adds infra list for the new ClickHouse legacy group.
test/tap/groups/legacy-clickhouse/env.sh Adds env overrides for the new ClickHouse legacy group.
test/tap/groups/groups.json Re-formats group mapping and moves ClickHouse tests onto legacy-clickhouse-g1.
test/infra/control/run-tests-isolated.bash Redirects coverage tool output to coverage-generation.log and restricts included source roots.
test/infra/control/run-multi-group.bash Defaults AUTO_CLEANUP=1 and redirects combined coverage tool output to coverage-generation.log.
test/infra/infra-mysql57/conf/proxysql/infra-config.sql Updates/adds debug filters used in infra configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +171 to +179
for i in $(seq 1 "${NUM_NODES}"); do
ADMIN_PORT=$((6032 + i * 10))
RESTAPI_PORT=$((6070 + i))
echo ">>> Configuring proxy-node${i} (port ${ADMIN_PORT})"

${MYSQL_CMD} -P${ADMIN_PORT} <<SQL
UPDATE global_variables SET variable_value='false' WHERE variable_name='admin-hash_passwords';
SET admin-restapi_port=${RESTAPI_PORT};
SET admin-restapi_enabled='true';
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the single-container cluster, RESTAPI_PORT=$((6070 + i)) can collide with other nodes’ admin/mysql ports (e.g. node2 restapi=6072 conflicts with proxy-node4 admin=6072, node3 restapi=6073 conflicts with proxy-node4 mysql=6073). Please change the per-node REST API port assignment to a non-overlapping, deterministic range (or derive it from ADMIN_PORT with a large offset) so each node can enable RESTAPI without bind failures.

Copilot uses AI. Check for mistakes.
interfaces=\"0.0.0.0:\${MYSQL_PORT}\"
}
NODECNF

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All ProxySQL processes in this container inherit GCOV_PREFIX=/gcov, so the background cluster nodes will write .gcda files into the same paths as the primary process. That can corrupt coverage output due to concurrent writes and make reports flaky. Consider unsetting GCOV env vars for the background nodes, or set a unique GCOV_PREFIX per node (e.g., /gcov/node${i}) and merge the results during coverage collection.

Suggested change
# Avoid concurrent coverage writes from background nodes: disable gcov env vars here.
unset GCOV_PREFIX GCOV_PREFIX_STRIP

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +152
# Build proxysql_servers entries: primary + first 3 nodes as core
PROXYSQL_SERVERS_SQL="DELETE FROM proxysql_servers;"
for i in $(seq 1 3); do
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROXYSQL_SERVERS_SQL is always built for core nodes 1..3, even when NUM_NODES is configured smaller. With PROXYSQL_CLUSTER_NODES=1/2, this will insert non-existent peers into proxysql_servers and can break/slow cluster sync. Please cap the core-node loop to min(NUM_NODES, 3) (or gate it) so the script behaves correctly for smaller clusters too.

Suggested change
# Build proxysql_servers entries: primary + first 3 nodes as core
PROXYSQL_SERVERS_SQL="DELETE FROM proxysql_servers;"
for i in $(seq 1 3); do
# Build proxysql_servers entries: primary + up to first 3 nodes as core
PROXYSQL_SERVERS_SQL="DELETE FROM proxysql_servers;"
CORE_NODES=3
if [ "${NUM_NODES}" -lt 3 ]; then
CORE_NODES="${NUM_NODES}"
fi
for i in $(seq 1 "${CORE_NODES}"); do

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +212
# Install on core nodes (1-3)
for i in $(seq 1 3); do
ADMIN_PORT=$((6032 + i * 10))
${MYSQL_CMD} -P${ADMIN_PORT} <<SQL
INSERT OR REPLACE INTO scheduler (interval_ms, filename) VALUES (12000, '/tmp/check_all_nodes.bash');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
SQL
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scheduler installation loop is hard-coded to seq 1 3 for “core nodes”. If NUM_NODES is configured smaller than 3, this will try to configure scheduler entries on admin ports that don’t exist, causing avoidable failures/timeouts. Please cap this loop similarly to min(NUM_NODES, 3) (or skip when the node wasn’t started).

Copilot uses AI. Check for mistakes.
Comment on lines +62 to 69
# Cluster Nodes — all run inside the ProxySQL container on different ports
# Port scheme: proxy-node1=6042, proxy-node2=6052, ..., proxy-node9=6122
# From the test-runner container, reach them via the proxysql hostname
CLUSTER_NODES=""
for i in $(seq 1 9); do
CLUSTER_NODES="${CLUSTER_NODES}proxy-node${i}:6032,"
PORT=$((6032 + i * 10))
CLUSTER_NODES="${CLUSTER_NODES}proxysql:${PORT},"
done
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TAP_CLUSTER_NODES is still built with seq 1 9 regardless of PROXYSQL_CLUSTER_NODES / SKIP_CLUSTER_START. Since the cluster size is now configurable, this can make tests attempt to connect to nodes that weren’t started (or miss nodes when >9). Please derive the loop bound from PROXYSQL_CLUSTER_NODES (and produce an empty list when cluster start is skipped).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
test/infra/control/run-tests-isolated.bash (1)

274-286: Minor: Progress message goes only to log file.

Line 275 redirects the "Preparing coverage data directory..." message to the log file, so users won't see it during execution. This aligns with the PR objective to reduce stdout noise. Consider whether the "Running fastcov on /gcov..." message on line 282 should also go to the log for consistency, or keep it on stdout as a key progress indicator.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-tests-isolated.bash` around lines 274 - 286, The
"Preparing coverage data directory..." message is currently redirected to the
coverage_log while the "Running fastcov on /gcov..." message goes to stdout;
make them consistent by redirecting the latter to the same log: update the echo
">>> Running fastcov on /gcov..." to append to "${coverage_log}" (>>
"${coverage_log}" 2>&1) so progress messages for coverage steps use the same
coverage_log variable alongside the fastcov invocation that already logs to
"${coverage_log}"; locate the echo near the fastcov call (references:
coverage_log, coverage_file, nproc_val, WORKSPACE, /gcov, fastcov) and apply the
redirection.
test/infra/control/check_all_nodes.bash (2)

16-19: Quote array expansions to avoid word splitting.

While the current table names don't contain special characters, quoting array expansions prevents future issues if table names with spaces or glob characters are added.

Proposed fix
 ALL_TABLES=()
-for i in ${!TABLES[@]}; do
-    ALL_TABLES+=(${TABLES[$i]})
-    ALL_TABLES+=("runtime_"${TABLES[$i]})
+for i in "${!TABLES[@]}"; do
+    ALL_TABLES+=("${TABLES[$i]}")
+    ALL_TABLES+=("runtime_${TABLES[$i]}")
 done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/check_all_nodes.bash` around lines 16 - 19, The loop that
appends entries to ALL_TABLES should quote array expansions to prevent
word-splitting and globbing: update the for-loop body that uses TABLES and
ALL_TABLES so it uses "${TABLES[$i]}", "${ALL_TABLES[@]}" where appropriate and
add quotes around the "runtime_${TABLES[$i]}" expansion; specifically modify the
block that iterates over TABLES (the for i in ${!TABLES[@]} loop) to append
quoted values to ALL_TABLES (use ALL_TABLES+=("${TABLES[$i]}") and
ALL_TABLES+=("runtime_${TABLES[$i]}")) so future table names with spaces or
special chars are handled safely.

24-28: Quote the inner loop array expansion.

Similar to the previous loop, quote ${!ALL_TABLES[@]} for robustness.

Proposed fix
 for port in ${PORTS}; do
-    for i in ${!ALL_TABLES[@]}; do
+    for i in "${!ALL_TABLES[@]}"; do
         echo "SELECT COUNT(*) FROM ${ALL_TABLES[$i]};"
     done | mysql -u admin -padmin -h 127.0.0.1 -P ${port} > /dev/null 2>&1 &
 done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/check_all_nodes.bash` around lines 24 - 28, The inner loop
currently iterates over ${!ALL_TABLES[@]} unquoted; update the loop header to
quote the array expansion (for i in "${!ALL_TABLES[@]}"; do) so it handles
entries with spaces or special chars safely—modify the loop that references
ALL_TABLES in the section that iterates ports (uses PORTS) to use the quoted
form.
test/infra/control/start-proxysql-isolated.bash (3)

132-142: Use [[ instead of [ for conditional tests in the node wait loop.

Same issue as the primary wait loop - use [[ for consistency and safety.

Proposed fix
-    while [ $COUNT -lt $MAX_WAIT ]; do
+    while [[ $COUNT -lt $MAX_WAIT ]]; do
         if docker exec "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P${ADMIN_PORT} -e 'SELECT 1' >/dev/null 2>&1; then
             echo " OK."
             break
         fi
         echo -n "."
         sleep 1
         COUNT=$((COUNT+1))
     done
-    if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT (node ${i})"; exit 1; fi
+    if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT (node ${i})"; exit 1; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 132 - 142, The
node wait loop uses the single-bracket test; update the conditional expressions
to use double-bracket tests for safety and consistency: replace the while test
that checks COUNT vs MAX_WAIT (while [ $COUNT -lt $MAX_WAIT ]) with while [[
$COUNT -lt $MAX_WAIT ]] and replace the final timeout if test (if [ $COUNT -ge
$MAX_WAIT ]) with if [[ $COUNT -ge $MAX_WAIT ]]; keep the rest of the loop
(docker exec against PROXY_CONTAINER using ADMIN_PORT, the COUNT increment and
break) unchanged so behavior of the node wait loop remains the same.

112-125: Use [[ instead of [ for conditional tests.

Per SonarCloud analysis, [[ is safer and more feature-rich in bash scripts (handles word splitting and globbing automatically).

Proposed fix
-while [ $COUNT -lt $MAX_WAIT ]; do
+while [[ $COUNT -lt $MAX_WAIT ]]; do
     if docker exec "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P6032 -e 'SELECT 1' >/dev/null 2>&1; then
         docker exec "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P6032 -e "
             SET clickhouse-mysql_ifaces='0.0.0.0:8000';
             LOAD CLICKHOUSE VARIABLES TO RUNTIME;
         " >/dev/null 2>&1 || true
         echo " Ready."
         break
     fi
     echo -n "."
     sleep 1
     COUNT=$((COUNT+1))
 done
-if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT"; exit 1; fi
+if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT"; exit 1; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 112 - 125,
Replace the POSIX [ ] tests with bash [[ ]] tests to avoid word-splitting and
globbing issues: update the while condition (while [ $COUNT -lt $MAX_WAIT ]; do
→ while [[ $COUNT -lt $MAX_WAIT ]]; do), the inner if that checks the docker
exec exit code (if docker exec ...; then remains but any other [ ] tests should
use [[ ]]), and the final timeout check (if [ $COUNT -ge $MAX_WAIT ]; then → if
[[ $COUNT -ge $MAX_WAIT ]]; then). Ensure all corresponding closing brackets are
switched and that variables like COUNT, MAX_WAIT and PROXY_CONTAINER are still
referenced the same way.

144-145: Use [[ for the numeric comparison.

Proposed fix
-if [ "${NUM_NODES}" -gt 0 ]; then
+if [[ "${NUM_NODES}" -gt 0 ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 144 - 145,
Replace the single-bracket test in the conditional that checks NUM_NODES (the
line containing [ "${NUM_NODES}" -gt 0 ]) with a bash double-bracket numeric
comparison; i.e., use [[ ... ]] with the same operands so the conditional
correctly uses bash's arithmetic test syntax in start-proxysql-isolated.bash.
test/tap/groups/legacy-clickhouse/env.sh (1)

1-3: Add a shebang for shell identification.

Although this file is sourced rather than executed directly, adding a shebang helps static analysis tools and clarifies the intended shell dialect.

Proposed fix
+#!/bin/bash
 # Legacy ClickHouse Test Group Environment

 export REGULAR_INFRA_DATADIR="/var/lib/proxysql"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/legacy-clickhouse/env.sh` around lines 1 - 3, Add a shebang
as the first line of the env file to identify the shell dialect (e.g.,
#!/usr/bin/env bash) so static analysis tools recognize it; update the top of
the file containing the REGULAR_INFRA_DATADIR export to begin with the shebang
line (before any comments or exports) to avoid changing behavior when the file
is sourced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/infra/control/run-tests-isolated.bash`:
- Around line 284-286: The fastcov invocation in the run-tests-isolated.bash
script is using an invalid `-i` flag; update the fastcov command (the fastcov
call that currently includes `-i \"${WORKSPACE}/include/\" \"${WORKSPACE}/lib/\"
\"${WORKSPACE}/src/\" \"${WORKSPACE}/test/\"`) to use the correct `--include`
option with the desired directories (e.g., `--include src/ include/ lib/ test/`)
so fastcov filters use the supported `--include`/`--exclude` syntax and keep the
rest of the command (output file, log redirection, and warning on failure)
unchanged.

In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 84-86: The line starting with "exec /usr/bin/proxysql ... | tee
/var/lib/proxysql/proxysql.log" uses a pipeline that prevents the proxysql
process from being PID 1 and breaking signal propagation; replace that pipeline
so exec runs proxysql directly as PID 1 and still captures logs — either remove
the pipe and redirect stdout/stderr to the log file, or use shell process
substitution or a backgrounded tee so the exec invocation is not part of a
pipeline (i.e., ensure the "exec /usr/bin/proxysql ..." invocation is the direct
execed command and not the left side of a pipe).
- Around line 150-155: The loop that builds PROXYSQL_SERVERS_SQL currently
hardcodes core nodes with for i in $(seq 1 3) causing failures when
PROXYSQL_CLUSTER_NODES/NUM_NODES < 3; change the loop to compute an upper bound
as the minimum of 3 and NUM_NODES (or PROXYSQL_CLUSTER_NODES) and iterate to
that bound so you only insert existing nodes; apply the same min(3,NUM_NODES)
guard to the other similar loop that inserts core nodes (the block that mirrors
this pattern around the later proxysql core insertion).

---

Nitpick comments:
In `@test/infra/control/check_all_nodes.bash`:
- Around line 16-19: The loop that appends entries to ALL_TABLES should quote
array expansions to prevent word-splitting and globbing: update the for-loop
body that uses TABLES and ALL_TABLES so it uses "${TABLES[$i]}",
"${ALL_TABLES[@]}" where appropriate and add quotes around the
"runtime_${TABLES[$i]}" expansion; specifically modify the block that iterates
over TABLES (the for i in ${!TABLES[@]} loop) to append quoted values to
ALL_TABLES (use ALL_TABLES+=("${TABLES[$i]}") and
ALL_TABLES+=("runtime_${TABLES[$i]}")) so future table names with spaces or
special chars are handled safely.
- Around line 24-28: The inner loop currently iterates over ${!ALL_TABLES[@]}
unquoted; update the loop header to quote the array expansion (for i in
"${!ALL_TABLES[@]}"; do) so it handles entries with spaces or special chars
safely—modify the loop that references ALL_TABLES in the section that iterates
ports (uses PORTS) to use the quoted form.

In `@test/infra/control/run-tests-isolated.bash`:
- Around line 274-286: The "Preparing coverage data directory..." message is
currently redirected to the coverage_log while the "Running fastcov on /gcov..."
message goes to stdout; make them consistent by redirecting the latter to the
same log: update the echo ">>> Running fastcov on /gcov..." to append to
"${coverage_log}" (>> "${coverage_log}" 2>&1) so progress messages for coverage
steps use the same coverage_log variable alongside the fastcov invocation that
already logs to "${coverage_log}"; locate the echo near the fastcov call
(references: coverage_log, coverage_file, nproc_val, WORKSPACE, /gcov, fastcov)
and apply the redirection.

In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 132-142: The node wait loop uses the single-bracket test; update
the conditional expressions to use double-bracket tests for safety and
consistency: replace the while test that checks COUNT vs MAX_WAIT (while [
$COUNT -lt $MAX_WAIT ]) with while [[ $COUNT -lt $MAX_WAIT ]] and replace the
final timeout if test (if [ $COUNT -ge $MAX_WAIT ]) with if [[ $COUNT -ge
$MAX_WAIT ]]; keep the rest of the loop (docker exec against PROXY_CONTAINER
using ADMIN_PORT, the COUNT increment and break) unchanged so behavior of the
node wait loop remains the same.
- Around line 112-125: Replace the POSIX [ ] tests with bash [[ ]] tests to
avoid word-splitting and globbing issues: update the while condition (while [
$COUNT -lt $MAX_WAIT ]; do → while [[ $COUNT -lt $MAX_WAIT ]]; do), the inner if
that checks the docker exec exit code (if docker exec ...; then remains but any
other [ ] tests should use [[ ]]), and the final timeout check (if [ $COUNT -ge
$MAX_WAIT ]; then → if [[ $COUNT -ge $MAX_WAIT ]]; then). Ensure all
corresponding closing brackets are switched and that variables like COUNT,
MAX_WAIT and PROXY_CONTAINER are still referenced the same way.
- Around line 144-145: Replace the single-bracket test in the conditional that
checks NUM_NODES (the line containing [ "${NUM_NODES}" -gt 0 ]) with a bash
double-bracket numeric comparison; i.e., use [[ ... ]] with the same operands so
the conditional correctly uses bash's arithmetic test syntax in
start-proxysql-isolated.bash.

In `@test/tap/groups/legacy-clickhouse/env.sh`:
- Around line 1-3: Add a shebang as the first line of the env file to identify
the shell dialect (e.g., #!/usr/bin/env bash) so static analysis tools recognize
it; update the top of the file containing the REGULAR_INFRA_DATADIR export to
begin with the shebang line (before any comments or exports) to avoid changing
behavior when the file is sourced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 72fe5b8a-ca39-49cd-a6fa-b4b13dc87a3c

📥 Commits

Reviewing files that changed from the base of the PR and between 879ebdc and 5a868d3.

📒 Files selected for processing (13)
  • test/infra/control/check_all_nodes.bash
  • test/infra/control/env-isolated.bash
  • test/infra/control/run-multi-group.bash
  • test/infra/control/run-tests-isolated.bash
  • test/infra/control/start-proxysql-isolated.bash
  • test/infra/infra-mysql57/conf/proxysql/infra-config.sql
  • test/tap/groups/default/pre-proxysql.bash
  • test/tap/groups/groups.json
  • test/tap/groups/legacy-clickhouse/env.sh
  • test/tap/groups/legacy-clickhouse/infras.lst
  • test/tap/groups/legacy-clickhouse/pre-proxysql.bash
  • test/tap/groups/legacy/infras.lst
  • test/tap/groups/legacy/pre-proxysql.bash
💤 Files with no reviewable changes (1)
  • test/tap/groups/legacy/infras.lst
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: Agent
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: run / trigger
  • GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness
📚 Learning: 2026-01-20T09:34:27.165Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

  • test/tap/groups/legacy-clickhouse/pre-proxysql.bash
  • test/tap/groups/legacy/pre-proxysql.bash
  • test/tap/groups/default/pre-proxysql.bash
  • test/infra/infra-mysql57/conf/proxysql/infra-config.sql
  • test/infra/control/env-isolated.bash
  • test/infra/control/run-multi-group.bash
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

Applied to files:

  • test/tap/groups/legacy-clickhouse/pre-proxysql.bash
  • test/tap/groups/legacy/pre-proxysql.bash
  • test/tap/groups/default/pre-proxysql.bash
  • test/infra/infra-mysql57/conf/proxysql/infra-config.sql
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to test/tap/tests/unit/**/*.cpp : Unit tests in test/tap/tests/unit/ must use test_globals.h and test_init.h and link against libproxysql.a via the custom test harness

Applied to files:

  • test/tap/groups/legacy-clickhouse/pre-proxysql.bash
  • test/tap/groups/legacy/pre-proxysql.bash
  • test/tap/groups/default/pre-proxysql.bash
  • test/infra/control/env-isolated.bash
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to **/*.{cpp,h,hpp} : C++17 is required; use conditional compilation via `#ifdef` PROXYSQLGENAI, `#ifdef` PROXYSQL31, etc. for feature flags

Applied to files:

  • test/infra/infra-mysql57/conf/proxysql/infra-config.sql
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Applies to **/*.{cpp,h,hpp} : Class names must use PascalCase with protocol prefixes (MySQL_, PgSQL_, ProxySQL_)

Applied to files:

  • test/infra/infra-mysql57/conf/proxysql/infra-config.sql
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Feature tiers are controlled by build flags: PROXYSQL31=1 for v3.1.x, PROXYSQLGENAI=1 for v4.0.x; PROXYSQLGENAI=1 implies PROXYSQL31=1

Applied to files:

  • test/infra/control/run-multi-group.bash
🪛 GitHub Check: SonarCloud Code Analysis
test/infra/control/start-proxysql-isolated.bash

[failure] 141-141: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s74&open=AZ0ac6owh5l158zV8s74&pullRequest=5528


[failure] 125-125: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s72&open=AZ0ac6owh5l158zV8s72&pullRequest=5528


[failure] 145-145: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s75&open=AZ0ac6owh5l158zV8s75&pullRequest=5528


[failure] 132-132: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s73&open=AZ0ac6owh5l158zV8s73&pullRequest=5528

🪛 Shellcheck (0.11.0)
test/tap/groups/legacy-clickhouse/env.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

test/infra/control/check_all_nodes.bash

[error] 16-16: Double quote array expansions to avoid re-splitting elements.

(SC2068)


[warning] 17-17: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 18-18: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[error] 25-25: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (13)
test/tap/groups/legacy-clickhouse/infras.lst (1)

1-1: LGTM! Infrastructure reference is valid and properly configured.

The addition of infra-clickhouse23 is correct. The infrastructure directory exists with all necessary files (docker-compose.yml, init scripts, configuration), and this change aligns with the PR's goal of reorganizing ClickHouse tests into a dedicated group.

test/infra/infra-mysql57/conf/proxysql/infra-config.sql (2)

97-98: Handler name updates are correctly aligned with concrete session entry points.

These renamed filters match the actual __get_connection handler symbols, so this should restore accurate trace capture for that DSS transition path.


107-113: Added debug filters are well-scoped for monitor/connection-path diagnostics.

The new entries target prepared statement continuations and monitor/data-stream connection flow, which should improve infra flake triage without overly broad filtering.

test/infra/control/run-multi-group.bash (4)

14-14: LGTM: Default AUTO_CLEANUP=1 aligns with CI best practices.

Defaulting to auto-cleanup for successful groups helps prevent disk bloat in CI environments while preserving logs for failed groups that need debugging.

Also applies to: 38-38


357-366: LGTM: Coverage log setup is properly structured.

The COVERAGE_LOG variable is correctly defined and exported to the Docker container environment.


368-401: LGTM: Coverage tool output redirection is consistent.

The fastcov/lcov output is appropriately redirected to the log file while keeping the final user-facing warning on stdout to inform about failures.


411-424: LGTM: HTML report generation logging is consistent with other coverage tooling.

The genhtml output redirection follows the same pattern established for fastcov and lcov.

test/infra/control/run-tests-isolated.bash (2)

267-269: LGTM: Local coverage log variable setup.

The coverage_log variable is properly scoped and quoted.


297-302: LGTM: HTML generation logging is consistent with the coverage log pattern.

The genhtml output is properly redirected, and the warning message correctly references the log file for debugging.

test/infra/control/env-isolated.bash (1)

62-70: LGTM!

The port scheme calculation (6032 + i * 10) correctly generates admin ports 6042–6122 for nodes 1–9, and the proxysql:${PORT} format aligns with the single-container architecture where all nodes are reachable on the same hostname via different ports.

test/tap/groups/legacy-clickhouse/pre-proxysql.bash (1)

1-3: LGTM!

The no-op hook correctly delegates cluster startup to start-proxysql-isolated.bash, consistent with the other group hooks.

test/tap/groups/legacy/pre-proxysql.bash (1)

1-3: LGTM!

The hook is now a no-op, consistent with the centralized cluster startup in start-proxysql-isolated.bash.

test/tap/groups/default/pre-proxysql.bash (1)

12-19: LGTM!

The cluster detection logic correctly mirrors start-proxysql-isolated.bash, and the 10-second stabilization wait is appropriate for allowing the cluster nodes to fully initialize before tests begin.

Comment thread test/infra/control/run-tests-isolated.bash
Comment thread test/infra/control/start-proxysql-isolated.bash
Comment thread test/infra/control/start-proxysql-isolated.bash Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ProxySQL cluster setup for CI to run all nodes within a single Docker container, which is a great improvement for reducing resource contention and improving test stability. The changes also include redirecting coverage generation logs to files and filtering coverage sources. My review focuses on improving the robustness and readability of the new bash scripts. I've identified a potential bug where a hardcoded loop count could cause failures if fewer than 3 cluster nodes are used. I've also suggested replacing a fixed sleep with a more reliable polling mechanism to wait for cluster stabilization, and some minor improvements for script idiomaticity and efficiency.


# Build proxysql_servers entries: primary + first 3 nodes as core
PROXYSQL_SERVERS_SQL="DELETE FROM proxysql_servers;"
for i in $(seq 1 3); do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop hardcodes 3 as the number of core nodes. If NUM_NODES is less than 3, this script will fail later at lines 206-213 when it tries to connect to non-existent nodes. The number of core nodes should be the minimum of 3 and NUM_NODES.

A similar change is needed for the loop at line 206.

Suggested change
for i in $(seq 1 3); do
for i in $(seq 1 $(( NUM_NODES < 3 ? NUM_NODES : 3 )) ); do

Comment on lines +16 to 19
for i in ${!TABLES[@]}; do
ALL_TABLES+=(${TABLES[$i]})
ALL_TABLES+=("runtime_"${TABLES[$i]})
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop can be written more idiomatically using for item in "${array[@]}" syntax, which is generally more readable and less error-prone than iterating over indices.

for table in "${TABLES[@]}"; do
    ALL_TABLES+=("${table}")
    ALL_TABLES+=("runtime_${table}")
done

Comment on lines +24 to 28
for port in ${PORTS}; do
for i in ${!ALL_TABLES[@]}; do
echo "SELECT COUNT(*) FROM ${ALL_TABLES[$i]};"
done | mysql -u admin -padmin -h 127.0.0.1 -P ${port} > /dev/null 2>&1 &
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inner loop generating SQL queries can be made more efficient and readable by using printf to generate all queries at once, avoiding the shell loop. Also, it's good practice to quote shell variables like ${port} when they are used as arguments.

for port in ${PORTS}; do
    printf 'SELECT COUNT(*) FROM %s;\n' "${ALL_TABLES[@]}" | mysql -u admin -padmin -h 127.0.0.1 -P "${port}" > /dev/null 2>&1 &
done

Comment on lines +18 to 19
echo "[$(date '+%Y-%m-%d %H:%M:%S')] >>> Pre-proxysql: waiting for cluster to stabilize..."
sleep 10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed sleep can lead to flaky tests. It's better to replace it with a polling mechanism that actively checks if the cluster has stabilized. This can be done by querying the proxysql_servers table on the primary node until the expected number of cluster nodes are ONLINE.

echo "[$(date '+%Y-%m-%d %H:%M:%S')] >>> Pre-proxysql: waiting for cluster to stabilize..."
MAX_WAIT=30
COUNT=0
while [ $COUNT -lt $MAX_WAIT ]; do
    # The number of nodes configured in proxysql_servers is min(3, NUM_NODES)
    CORE_NODE_COUNT=$(( NUM_NODES < 3 ? NUM_NODES : 3 ))
    if [ "${CORE_NODE_COUNT}" -eq 0 ]; then
        break # No cluster nodes to check
    fi
    ONLINE_NODES=$(mysql -uradmin -pradmin -hproxysql -P6032 -NB -e "SELECT COUNT(*) FROM proxysql_servers WHERE status='ONLINE';" 2>/dev/null || echo 0)
    if [ "${ONLINE_NODES}" -ge "${CORE_NODE_COUNT}" ]; then
        echo " All ${CORE_NODE_COUNT} cluster nodes are ONLINE in primary's view."
        break
    fi
    echo -n "."
    sleep 1
    COUNT=$((COUNT+1))
done
if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT waiting for cluster to stabilize"; exit 1; fi

- RESTAPI port offset changed from 6070+i to 7070+i to avoid collision
  with admin/mysql ports (6072 would collide with node4 admin)
- Unset GCOV_PREFIX for background cluster nodes to prevent concurrent
  .gcda write corruption
- Cap core-node and scheduler loops to min(NUM_NODES, 3) so smaller
  clusters don't reference non-existent nodes
- TAP_CLUSTER_NODES now respects PROXYSQL_CLUSTER_NODES and
  SKIP_CLUSTER_START instead of hardcoding 9
- Fix fastcov flag: -i is invalid, use --include for source filtering
TAP tests running in the test-runner container query proxysql_servers
to find cluster nodes, then connect to them directly. With 127.0.0.1,
the test-runner connects to itself instead of the ProxySQL container.

Using 'proxysql' (the container hostname) works from both inside the
container (resolves via /etc/hosts) and from the test-runner (resolves
via Docker DNS on the shared network).
All noise thread failure messages now include host, port, user, and
the actual error message. Previously they only logged generic
"connection failure" with no details, making it impossible to
diagnose why a noise thread failed.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
test/infra/control/start-proxysql-isolated.bash (1)

87-89: ⚠️ Potential issue | 🟡 Minor

Avoid piping the final exec.

exec on the left side of a pipeline only replaces the pipeline child, so ProxySQL still is not PID 1 here, and its exit status is still hidden behind tee.

Make ProxySQL the direct exec target
-exec /usr/bin/proxysql --idle-threads --clickhouse-server --sqlite3-server -f -c /etc/proxysql.cnf -D /var/lib/proxysql 2>&1 | tee /var/lib/proxysql/proxysql.log
+exec /usr/bin/proxysql --idle-threads --clickhouse-server --sqlite3-server -f -c /etc/proxysql.cnf -D /var/lib/proxysql > >(tee /var/lib/proxysql/proxysql.log) 2>&1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 87 - 89, The
current line uses exec on the left side of a pipeline so ProxySQL is not PID 1
and its exit status is obscured; change the invocation so /usr/bin/proxysql is
exec'd directly and its stdout/stderr are tee'd via process substitution (e.g.
use exec /usr/bin/proxysql ... -D /var/lib/proxysql > >(tee
/var/lib/proxysql/proxysql.log) 2>&1) so ProxySQL becomes PID 1 and logs still
go to /var/lib/proxysql/proxysql.log; also remove the stray trailing
double-quote from the command.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 24-28: Validate the PROXYSQL_CLUSTER_NODES env before building the
inner script: ensure you read PROXYSQL_CLUSTER_NODES into NUM_NODES, but then
validate it matches a single digit 0..9 (e.g., regex '^[0-9]$') and reject/exit
with a clear error if not; keep the existing override for SKIP_CLUSTER_START
(when SKIP_CLUSTER_START == "1" or "true" set NUM_NODES=0) and ensure the
validation runs after that override so an explicit skip still yields 0, and
perform no further interpolation if validation fails to prevent out-of-range
values (e.g., 10+) from being injected into the /bin/bash -c payload.
- Around line 57-58: The global unset of GCOV_PREFIX and GCOV_PREFIX_STRIP
removes coverage env vars for the foreground ProxySQL too; revert the
unconditional "unset GCOV_PREFIX GCOV_PREFIX_STRIP" and instead remove those
variables only for background cluster node launches by using env -u when
invoking the background start command (the invocation around the current env -u
suggestion at the location where background nodes are started, referenced in the
review as the command at line ~83), leaving the primary ProxySQL start (the
process started where the script launches the foreground ProxySQL around line
~88) to inherit GCOV_PREFIX/GCOV_PREFIX_STRIP so it can write coverage to /gcov.
- Line 128: Replace POSIX test brackets with Bash conditional brackets for the
numeric comparisons using the COUNT and MAX_WAIT variables: change occurrences
like "if [ $COUNT -ge $MAX_WAIT ]", "while [ $COUNT -lt $MAX_WAIT ]" and the
other similar tests to use "[[ $COUNT -ge $MAX_WAIT ]]" / "[[ $COUNT -lt
$MAX_WAIT ]]" respectively; keep proper spacing inside the brackets and ensure
all listed occurrences (the if at the shown diff plus the while and other tests
mentioned) are updated for SonarCloud compliance.

---

Duplicate comments:
In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 87-89: The current line uses exec on the left side of a pipeline
so ProxySQL is not PID 1 and its exit status is obscured; change the invocation
so /usr/bin/proxysql is exec'd directly and its stdout/stderr are tee'd via
process substitution (e.g. use exec /usr/bin/proxysql ... -D /var/lib/proxysql >
>(tee /var/lib/proxysql/proxysql.log) 2>&1) so ProxySQL becomes PID 1 and logs
still go to /var/lib/proxysql/proxysql.log; also remove the stray trailing
double-quote from the command.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aec70347-e0be-4373-a027-1d5339438577

📥 Commits

Reviewing files that changed from the base of the PR and between 5a868d3 and 8705685.

📒 Files selected for processing (3)
  • test/infra/control/env-isolated.bash
  • test/infra/control/run-tests-isolated.bash
  • test/infra/control/start-proxysql-isolated.bash
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/infra/control/run-tests-isolated.bash
  • test/infra/control/env-isolated.bash
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: run / trigger
  • GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
📚 Learning: 2026-01-20T09:34:27.165Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

  • test/infra/control/start-proxysql-isolated.bash
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Feature tiers are controlled by build flags: PROXYSQL31=1 for v3.1.x, PROXYSQLGENAI=1 for v4.0.x; PROXYSQLGENAI=1 implies PROXYSQL31=1

Applied to files:

  • test/infra/control/start-proxysql-isolated.bash
🪛 GitHub Check: SonarCloud Code Analysis
test/infra/control/start-proxysql-isolated.bash

[failure] 155-155: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ag0NSOyJaQ2D5dNc-&open=AZ0ag0NSOyJaQ2D5dNc-&pullRequest=5528


[failure] 128-128: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s72&open=AZ0ac6owh5l158zV8s72&pullRequest=5528


[failure] 144-144: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s74&open=AZ0ac6owh5l158zV8s74&pullRequest=5528


[failure] 148-148: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s75&open=AZ0ac6owh5l158zV8s75&pullRequest=5528


[failure] 135-135: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s73&open=AZ0ac6owh5l158zV8s73&pullRequest=5528

Comment on lines +24 to +28
# Cluster configuration
NUM_NODES=${PROXYSQL_CLUSTER_NODES:-9}
if [[ "${SKIP_CLUSTER_START}" == "1" ]] || [[ "${SKIP_CLUSTER_START}" == "true" ]]; then
NUM_NODES=0
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate PROXYSQL_CLUSTER_NODES before building the inner script.

This value is now env-driven and gets interpolated into the /bin/bash -c payload verbatim. It also has a hard upper bound here: NUM_NODES=10 makes MYSQL_PORT=6133, which collides with the primary pgsql port documented on Line 50. Please reject anything except integers 0..9 up front.

Suggested guard
 NUM_NODES=${PROXYSQL_CLUSTER_NODES:-9}
 if [[ "${SKIP_CLUSTER_START}" == "1" ]] || [[ "${SKIP_CLUSTER_START}" == "true" ]]; then
     NUM_NODES=0
 fi
+if ! [[ "${NUM_NODES}" =~ ^[0-9]+$ ]] || [[ "${NUM_NODES}" -gt 9 ]]; then
+    echo "Error: PROXYSQL_CLUSTER_NODES must be an integer between 0 and 9."
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Cluster configuration
NUM_NODES=${PROXYSQL_CLUSTER_NODES:-9}
if [[ "${SKIP_CLUSTER_START}" == "1" ]] || [[ "${SKIP_CLUSTER_START}" == "true" ]]; then
NUM_NODES=0
fi
# Cluster configuration
NUM_NODES=${PROXYSQL_CLUSTER_NODES:-9}
if [[ "${SKIP_CLUSTER_START}" == "1" ]] || [[ "${SKIP_CLUSTER_START}" == "true" ]]; then
NUM_NODES=0
fi
if ! [[ "${NUM_NODES}" =~ ^[0-9]+$ ]] || [[ "${NUM_NODES}" -gt 9 ]]; then
echo "Error: PROXYSQL_CLUSTER_NODES must be an integer between 0 and 9."
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 24 - 28,
Validate the PROXYSQL_CLUSTER_NODES env before building the inner script: ensure
you read PROXYSQL_CLUSTER_NODES into NUM_NODES, but then validate it matches a
single digit 0..9 (e.g., regex '^[0-9]$') and reject/exit with a clear error if
not; keep the existing override for SKIP_CLUSTER_START (when SKIP_CLUSTER_START
== "1" or "true" set NUM_NODES=0) and ensure the validation runs after that
override so an explicit skip still yields 0, and perform no further
interpolation if validation fails to prevent out-of-range values (e.g., 10+)
from being injected into the /bin/bash -c payload.

Comment on lines +57 to +58
# Disable gcov for background cluster nodes to avoid concurrent .gcda writes
unset GCOV_PREFIX GCOV_PREFIX_STRIP
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/infra/control/start-proxysql-isolated.bash | head -100

Repository: sysown/proxysql

Length of output: 4198


Keep GCOV_PREFIX* for the foreground ProxySQL.

The unset at lines 57–58 is shell-wide within the startup script, so the primary ProxySQL process started at line 88 inherits no GCOV_PREFIX/GCOV_PREFIX_STRIP, disabling coverage collection for the instance that should write data to /gcov.

Limit the unset to background cluster nodes only by using env -u at line 83 instead:

Suggested change
-# Disable gcov for background cluster nodes to avoid concurrent .gcda writes
-unset GCOV_PREFIX GCOV_PREFIX_STRIP
-
 # Start cluster nodes as background processes
 for i in $(seq 1 ${NUM_NODES}); do
     ADMIN_PORT=$((6032 + i * 10))
@@ -80,1 +79,2 @@
-    /usr/bin/proxysql --idle-threads -f -c ${NODE_DIR}/proxysql-node.cnf -D ${NODE_DIR} >> ${NODE_DIR}/proxysql.log 2>&1 &
+    env -u GCOV_PREFIX -u GCOV_PREFIX_STRIP \
+        /usr/bin/proxysql --idle-threads -f -c ${NODE_DIR}/proxysql-node.cnf -D ${NODE_DIR} >> ${NODE_DIR}/proxysql.log 2>&1 &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 57 - 58, The
global unset of GCOV_PREFIX and GCOV_PREFIX_STRIP removes coverage env vars for
the foreground ProxySQL too; revert the unconditional "unset GCOV_PREFIX
GCOV_PREFIX_STRIP" and instead remove those variables only for background
cluster node launches by using env -u when invoking the background start command
(the invocation around the current env -u suggestion at the location where
background nodes are started, referenced in the review as the command at line
~83), leaving the primary ProxySQL start (the process started where the script
launches the foreground ProxySQL around line ~88) to inherit
GCOV_PREFIX/GCOV_PREFIX_STRIP so it can write coverage to /gcov.

sleep 1
COUNT=$((COUNT+1))
done
if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT"; exit 1; fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and read the relevant lines
if [ -f "test/infra/control/start-proxysql-isolated.bash" ]; then
  echo "=== File found, reading content ==="
  wc -l "test/infra/control/start-proxysql-isolated.bash"
  echo ""
  echo "=== Lines 128, 135, 144, 148, 155 ==="
  sed -n '128p; 135p; 144p; 148p; 155p' "test/infra/control/start-proxysql-isolated.bash"
  echo ""
  echo "=== Context around line 128 ==="
  sed -n '125,160p' "test/infra/control/start-proxysql-isolated.bash"
else
  echo "File not found"
fi

Repository: sysown/proxysql

Length of output: 1846


🏁 Script executed:

# Search for all [ conditionals in the file to check if there are others
echo "=== All [ conditional patterns in the file ==="
rg '\[\s' "test/infra/control/start-proxysql-isolated.bash" -n

Repository: sysown/proxysql

Length of output: 698


Switch the [ tests to [[ for unquoted numeric variable comparisons to pass SonarCloud linting.

The identified changes at lines 128, 135, 144, 148, and 155 are accurate. Using [[ ... ]] instead of [ ... ] with unquoted numeric variables ($COUNT) is both SonarCloud-compliant and idiomatic Bash, with no behavioral change.

Note: Line 115 contains the same pattern (while [ $COUNT -lt $MAX_WAIT ]) and should also be converted to [[ ... ]] for consistency.

Sonar fix
-while [ $COUNT -lt $MAX_WAIT ]; do
+while [[ $COUNT -lt $MAX_WAIT ]]; do
@@
-if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT"; exit 1; fi
+if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT"; exit 1; fi
@@
-    while [ $COUNT -lt $MAX_WAIT ]; do
+    while [[ $COUNT -lt $MAX_WAIT ]]; do
@@
-    if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT (node ${i})"; exit 1; fi
+    if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT (node ${i})"; exit 1; fi
@@
-if [ "${NUM_NODES}" -gt 0 ]; then
+if [[ "${NUM_NODES}" -gt 0 ]]; then
@@
-    if [ "${NUM_NODES}" -lt 3 ]; then CORE_NODES="${NUM_NODES}"; fi
+    if [[ "${NUM_NODES}" -lt 3 ]]; then CORE_NODES="${NUM_NODES}"; fi
🧰 Tools
🪛 GitHub Check: SonarCloud Code Analysis

[failure] 128-128: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s72&open=AZ0ac6owh5l158zV8s72&pullRequest=5528

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` at line 128, Replace POSIX
test brackets with Bash conditional brackets for the numeric comparisons using
the COUNT and MAX_WAIT variables: change occurrences like "if [ $COUNT -ge
$MAX_WAIT ]", "while [ $COUNT -lt $MAX_WAIT ]" and the other similar tests to
use "[[ $COUNT -ge $MAX_WAIT ]]" / "[[ $COUNT -lt $MAX_WAIT ]]" respectively;
keep proper spacing inside the brackets and ensure all listed occurrences (the
if at the shown diff plus the while and other tests mentioned) are updated for
SonarCloud compliance.

Test now prints a description of what it does and numbered steps:
1. Connect to admin
2. Disable cluster scheduler
3. Measure idle CPU baseline
4. Trigger BUSY_LOOP (disconnect during query)
5. Measure CPU after BUSY_LOOP
6. Wait for cleanup
7. Verify CPU returned to idle
8. Trigger INF_LOOP (disconnect after response)
9. Measure CPU after INF_LOOP
10. Recover cluster scheduler
Cluster nodes only configured mysql admin and data ports, leaving
pgsql (6133/6132) and sqlite (6030) at defaults — same as the
primary. All nodes tried to bind the same ports, causing:
1. Bind failures for pgsql on cluster nodes
2. Cluster sync overwriting primary's pgsql config with empty tables
   (nodes had no working pgsql, so their pgsql tables were empty)

Now each node gets unique ports for all interfaces:
  node N: mysql admin=6032+N*10, mysql data=6033+N*10,
          pgsql data=6133+N*10, pgsql admin=6132+N*10,
          sqlite=6030+N*10
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
test/infra/control/start-proxysql-isolated.bash (3)

56-58: ⚠️ Potential issue | 🟠 Major

The global unset disables coverage for the primary ProxySQL process.

The unset GCOV_PREFIX GCOV_PREFIX_STRIP at lines 57-58 runs before the primary ProxySQL starts at line 99, so the primary instance won't write coverage data to /gcov. The intent was to disable coverage only for background cluster nodes.

Fix: use `env -u` only for background node launches

Remove lines 57-58 and modify line 94 to unset the variables only for background processes:

-# Disable gcov for background cluster nodes to avoid concurrent .gcda writes
-unset GCOV_PREFIX GCOV_PREFIX_STRIP
-
 # Start cluster nodes as background processes
 for i in \$(seq 1 ${NUM_NODES}); do
     ...
-    /usr/bin/proxysql --idle-threads -f -c \${NODE_DIR}/proxysql-node.cnf -D \${NODE_DIR} >> \${NODE_DIR}/proxysql.log 2>&1 &
+    env -u GCOV_PREFIX -u GCOV_PREFIX_STRIP /usr/bin/proxysql --idle-threads -f -c \${NODE_DIR}/proxysql-node.cnf -D \${NODE_DIR} >> \${NODE_DIR}/proxysql.log 2>&1 &
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 56 - 58, The
global unset of GCOV_PREFIX and GCOV_PREFIX_STRIP inside the STARTUP_CMD removes
coverage for the primary ProxySQL process; remove the lines that unset these
variables from STARTUP_CMD and instead apply "env -u" when spawning background
cluster nodes (the commands that launch background instances), so only
background node launches have GCOV_PREFIX/GCOV_PREFIX_STRIP removed while the
primary process retains coverage.

24-28: ⚠️ Potential issue | 🟠 Major

Add input validation for NUM_NODES to prevent port collisions.

If PROXYSQL_CLUSTER_NODES is set to 10 or higher, the calculated ports will collide with primary pgsql ports (e.g., node 10: admin=6132 collides with pgsql admin). Non-integer values would also cause arithmetic errors.

Suggested validation
 # Cluster configuration
 NUM_NODES=${PROXYSQL_CLUSTER_NODES:-9}
 if [[ "${SKIP_CLUSTER_START}" == "1" ]] || [[ "${SKIP_CLUSTER_START}" == "true" ]]; then
     NUM_NODES=0
 fi
+if ! [[ "${NUM_NODES}" =~ ^[0-9]$ ]]; then
+    echo "Error: PROXYSQL_CLUSTER_NODES must be an integer between 0 and 9."
+    exit 1
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 24 - 28, The
NUM_NODES assignment using PROXYSQL_CLUSTER_NODES lacks validation and can
accept non-integers or values >=10 causing port collisions; add input validation
after NUM_NODES is set (and after honoring SKIP_CLUSTER_START) to ensure
NUM_NODES is an integer between 0 and 9 inclusive, and if invalid print a clear
error to stderr and exit with non-zero status. Validate the value referenced by
PROXYSQL_CLUSTER_NODES (and the resulting NUM_NODES) using a numeric check and
range check, and handle non-integer input gracefully (error+exit) rather than
letting bash arithmetic or port calculation proceed; update any error messages
to reference NUM_NODES/PROXYSQL_CLUSTER_NODES and SKIP_CLUSTER_START for
clarity.

98-100: ⚠️ Potential issue | 🟡 Minor

The exec ... | tee pattern prevents proper signal propagation to the ProxySQL process.

When exec is combined with a pipe, the shell creates a subshell for the pipeline, so exec replaces only the subshell rather than the main shell process. Signals sent to PID 1 may not reach ProxySQL directly.

Alternative: use process substitution to preserve exec semantics
-exec /usr/bin/proxysql --idle-threads --clickhouse-server --sqlite3-server -f -c /etc/proxysql.cnf -D /var/lib/proxysql 2>&1 | tee /var/lib/proxysql/proxysql.log
+exec /usr/bin/proxysql --idle-threads --clickhouse-server --sqlite3-server -f -c /etc/proxysql.cnf -D /var/lib/proxysql > >(tee /var/lib/proxysql/proxysql.log) 2>&1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 98 - 100, The
current line uses "exec /usr/bin/proxysql ... | tee
/var/lib/proxysql/proxysql.log", which breaks signal propagation because exec is
used with a pipe; change it to preserve exec semantics by redirecting ProxySQL's
stdout/stderr into a process substitution instead of a pipeline (so PID 1
becomes the proxysql binary). Update the invocation that references
/usr/bin/proxysql and /var/lib/proxysql/proxysql.log to use process substitution
(or an equivalent direct redirection) so signals delivered to PID 1 are received
by the ProxySQL process.
🧹 Nitpick comments (3)
test/infra/control/start-proxysql-isolated.bash (3)

126-139: Switch [ to [[ for SonarCloud compliance.

SonarCloud flags the use of single brackets for conditional tests. Using [[ is safer and more feature-rich in Bash.

Suggested fix
-while [ $COUNT -lt $MAX_WAIT ]; do
+while [[ $COUNT -lt $MAX_WAIT ]]; do
     ...
 done
-if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT"; exit 1; fi
+if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT"; exit 1; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 126 - 139,
Replace POSIX-style test brackets with Bash extended tests: change the while and
final if conditions that use [ $COUNT -lt $MAX_WAIT ] and [ $COUNT -ge $MAX_WAIT
] to use [[ $COUNT -lt $MAX_WAIT ]] and [[ $COUNT -ge $MAX_WAIT ]], and also
ensure the inner if that checks the docker exec command uses [[ ... ]] form for
its conditional (keeping the same command invocation with PROXY_CONTAINER,
COUNT, MAX_WAIT variables and the docker exec mysql calls unchanged).

159-166: Switch [ to [[ in cluster initialization conditionals for SonarCloud compliance.

Lines 159 and 166 use single brackets which SonarCloud flags.

Suggested fix
-if [ "${NUM_NODES}" -gt 0 ]; then
+if [[ "${NUM_NODES}" -gt 0 ]]; then
     echo ">>> Initializing ProxySQL Cluster (${NUM_NODES} nodes)..."
     ...
     CORE_NODES=3
-    if [ "${NUM_NODES}" -lt 3 ]; then CORE_NODES="${NUM_NODES}"; fi
+    if [[ "${NUM_NODES}" -lt 3 ]]; then CORE_NODES="${NUM_NODES}"; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 159 - 166,
Replace POSIX test brackets with Bash conditional syntax to satisfy SonarCloud:
update the conditionals that use NUM_NODES (the checks controlling cluster
initialization and CORE_NODES assignment) from if [ "${NUM_NODES}" -gt 0 ]; then
and if [ "${NUM_NODES}" -lt 3 ]; then to use [[ ... ]] (e.g., if [[
"${NUM_NODES}" -gt 0 ]]; then and if [[ "${NUM_NODES}" -lt 3 ]]; then), leaving
the surrounding logic (MYSQL_CMD, CORE_NODES assignment) unchanged.

141-156: Switch [ to [[ in cluster node wait loop for SonarCloud compliance.

Lines 146 and 155 use single brackets which SonarCloud flags.

Suggested fix
-    while [ $COUNT -lt $MAX_WAIT ]; do
+    while [[ $COUNT -lt $MAX_WAIT ]]; do
         ...
     done
-    if [ $COUNT -ge $MAX_WAIT ]; then echo " TIMEOUT (node ${i})"; exit 1; fi
+    if [[ $COUNT -ge $MAX_WAIT ]]; then echo " TIMEOUT (node ${i})"; exit 1; fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/start-proxysql-isolated.bash` around lines 141 - 156,
Replace the POSIX test brackets with Bash conditional brackets in the cluster
wait loop: change the while and if tests that currently use [ $COUNT -lt
$MAX_WAIT ] and [ $COUNT -ge $MAX_WAIT ] to use [[ ... ]] (e.g., [[ $COUNT -lt
$MAX_WAIT ]] and [[ $COUNT -ge $MAX_WAIT ]]) for SonarCloud compliance and safer
evaluation of variables like COUNT and MAX_WAIT in the loop that references
NUM_NODES, ADMIN_PORT, PROXY_CONTAINER, COUNT and MAX_WAIT.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 56-58: The global unset of GCOV_PREFIX and GCOV_PREFIX_STRIP
inside the STARTUP_CMD removes coverage for the primary ProxySQL process; remove
the lines that unset these variables from STARTUP_CMD and instead apply "env -u"
when spawning background cluster nodes (the commands that launch background
instances), so only background node launches have GCOV_PREFIX/GCOV_PREFIX_STRIP
removed while the primary process retains coverage.
- Around line 24-28: The NUM_NODES assignment using PROXYSQL_CLUSTER_NODES lacks
validation and can accept non-integers or values >=10 causing port collisions;
add input validation after NUM_NODES is set (and after honoring
SKIP_CLUSTER_START) to ensure NUM_NODES is an integer between 0 and 9 inclusive,
and if invalid print a clear error to stderr and exit with non-zero status.
Validate the value referenced by PROXYSQL_CLUSTER_NODES (and the resulting
NUM_NODES) using a numeric check and range check, and handle non-integer input
gracefully (error+exit) rather than letting bash arithmetic or port calculation
proceed; update any error messages to reference NUM_NODES/PROXYSQL_CLUSTER_NODES
and SKIP_CLUSTER_START for clarity.
- Around line 98-100: The current line uses "exec /usr/bin/proxysql ... | tee
/var/lib/proxysql/proxysql.log", which breaks signal propagation because exec is
used with a pipe; change it to preserve exec semantics by redirecting ProxySQL's
stdout/stderr into a process substitution instead of a pipeline (so PID 1
becomes the proxysql binary). Update the invocation that references
/usr/bin/proxysql and /var/lib/proxysql/proxysql.log to use process substitution
(or an equivalent direct redirection) so signals delivered to PID 1 are received
by the ProxySQL process.

---

Nitpick comments:
In `@test/infra/control/start-proxysql-isolated.bash`:
- Around line 126-139: Replace POSIX-style test brackets with Bash extended
tests: change the while and final if conditions that use [ $COUNT -lt $MAX_WAIT
] and [ $COUNT -ge $MAX_WAIT ] to use [[ $COUNT -lt $MAX_WAIT ]] and [[ $COUNT
-ge $MAX_WAIT ]], and also ensure the inner if that checks the docker exec
command uses [[ ... ]] form for its conditional (keeping the same command
invocation with PROXY_CONTAINER, COUNT, MAX_WAIT variables and the docker exec
mysql calls unchanged).
- Around line 159-166: Replace POSIX test brackets with Bash conditional syntax
to satisfy SonarCloud: update the conditionals that use NUM_NODES (the checks
controlling cluster initialization and CORE_NODES assignment) from if [
"${NUM_NODES}" -gt 0 ]; then and if [ "${NUM_NODES}" -lt 3 ]; then to use [[ ...
]] (e.g., if [[ "${NUM_NODES}" -gt 0 ]]; then and if [[ "${NUM_NODES}" -lt 3 ]];
then), leaving the surrounding logic (MYSQL_CMD, CORE_NODES assignment)
unchanged.
- Around line 141-156: Replace the POSIX test brackets with Bash conditional
brackets in the cluster wait loop: change the while and if tests that currently
use [ $COUNT -lt $MAX_WAIT ] and [ $COUNT -ge $MAX_WAIT ] to use [[ ... ]]
(e.g., [[ $COUNT -lt $MAX_WAIT ]] and [[ $COUNT -ge $MAX_WAIT ]]) for SonarCloud
compliance and safer evaluation of variables like COUNT and MAX_WAIT in the loop
that references NUM_NODES, ADMIN_PORT, PROXY_CONTAINER, COUNT and MAX_WAIT.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 803d9120-c8a9-431a-91c8-a489784d68f7

📥 Commits

Reviewing files that changed from the base of the PR and between e299de5 and 4837642.

📒 Files selected for processing (1)
  • test/infra/control/start-proxysql-isolated.bash
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: run / trigger
  • GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
📚 Learning: 2026-01-20T09:34:27.165Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

  • test/infra/control/start-proxysql-isolated.bash
📚 Learning: 2026-03-22T14:38:16.082Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-22T14:38:16.082Z
Learning: Feature tiers are controlled by build flags: PROXYSQL31=1 for v3.1.x, PROXYSQLGENAI=1 for v4.0.x; PROXYSQLGENAI=1 implies PROXYSQL31=1

Applied to files:

  • test/infra/control/start-proxysql-isolated.bash
🪛 GitHub Check: SonarCloud Code Analysis
test/infra/control/start-proxysql-isolated.bash

[failure] 139-139: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s72&open=AZ0ac6owh5l158zV8s72&pullRequest=5528


[failure] 146-146: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s73&open=AZ0ac6owh5l158zV8s73&pullRequest=5528


[failure] 155-155: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s74&open=AZ0ac6owh5l158zV8s74&pullRequest=5528


[failure] 166-166: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ag0NSOyJaQ2D5dNc-&open=AZ0ag0NSOyJaQ2D5dNc-&pullRequest=5528


[failure] 159-159: Use '[[' instead of '[' for conditional tests. The '[[' construct is safer and more feature-rich.

See more on https://sonarcloud.io/project/issues?id=sysown_proxysql&issues=AZ0ac6owh5l158zV8s75&open=AZ0ac6owh5l158zV8s75&pullRequest=5528

🔇 Additional comments (5)
test/infra/control/start-proxysql-isolated.bash (5)

1-22: LGTM!

Standard script initialization with proper error handling (set -e, set -o pipefail), path derivation, and required variable validation.


30-42: LGTM!

Coverage directory setup, network creation, and data directory preparation are correctly implemented for CI isolation.


102-120: LGTM!

Docker container configuration with proper volume mounts, environment variables, and network setup is correct.


186-229: LGTM!

Node configuration and scheduler installation logic is correct. The use of CORE_NODES ensures scheduler is only installed on the appropriate subset of nodes, and RESTAPI ports are uniquely assigned.


231-234: LGTM!

Clear completion messages for cluster initialization and startup confirmation.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants