test(flush_logs): harden timing race — poll instead of fixed sleep by renecannao · Pull Request #5611 · sysown/proxysql

renecannao · 2026-04-11T13:42:15Z

Closes #5610 (partial — one of four tests).

Summary

Fixes test_flush_logs-t by replacing fixed sleeps with bounded polling. The old code depended on specific wall-clock windows (sleep 1, sleep 4) that occasionally slipped past on slow / contended GH runners, producing false failures on CI while the same code passed cleanly on every local workstation and on commit-after-commit empty retriggers in PR #5596.

Root cause

test_flush_logs-t.sh measures "did this command rotate proxysql.log?" by counting the number of [INFO] ProxySQL version marker lines before and after. The old flow was timing-dependent:

BASELINE=$(fn_get_rotations)       # includes sleep 1
fn_padmin "PROXYSQL FLUSH LOGS;"
RES=$(fn_get_rotations)            # sleep 1, then count
fn_check_res                       # strict equality

For the SIGUSR1 subtest it was worse because the signal was delivered via a "Scheduler Hack" (a 2000ms-interval scheduler row that kill -USR1 $(pidof proxysql)), plus a fixed sleep 4, plus the inner sleep 1 — total 5 s budget from signal scheduling to reading the log. Three async stages could slip past that window: the scheduler's first interval firing slightly late, the SIGUSR1 handler queueing the rotation on a worker thread, and the host-side volume bind-mount read lag. Any one blew the budget → RES != BASELINE+1 → false failure.

A secondary issue: strict -ne equality check would also fail if the scheduler hack fired twice before the cleanup tore its row down (interval 2000ms, sleep 4s → exactly one extra fire window). Two rotations from one signal is still a pass semantically.

Fix

Three changes in test/tap/tests/test_flush_logs-t.sh:

Replace fixed sleeps with bounded polling. New helper fn_wait_for_rotation_at_least <target> <max_seconds> polls fn_get_rotations every 500 ms until the count reaches the target or the timeout expires. fn_get_rotations is now sleepless (callers that need to race wait use the new helper). Max budget raised to 30 s (was effectively 5 s) — on fast machines the poll still terminates within one 500 ms cycle.
Accept "at least BASELINE+1" instead of exact equality. fn_check_res uses -lt instead of -ne. The test asks "did it trigger at least one rotation?" — two rotations from one signal is still a pass.
Split fn_signal into fn_signal (install only) and fn_signal_cleanup (teardown). Caller now races cleanly:
```
fn_signal SIGUSR1
RES=$(fn_wait_for_rotation_at_least $((BASELINE+1)) 30)
fn_signal_cleanup                # tear down AFTER rotation observed
fn_check_res
```
Scheduler row exists only for the bounded window between install and observation. Also dropped the scheduler interval from 2000 ms to 500 ms so the first fire happens quickly.

Local verification (dogfooding the new flake-debug recipe from #5609)

Reproduced the original PR #5596 failure context (legacy-g3 group with just test_flush_logs-t via TEST_PY_TAP_INCL), 10 back-to-back iterations against one infra lifecycle per test/README.md §"Debugging a flaky test":

attempt 1:  PASS
attempt 2:  PASS
…
attempt 10: PASS

=== total: 10/10 pass ===

Per-attempt wall time dropped from ~9 s to ~1 s because we're no longer burning fixed sleeps. Confirmed the new code path is active via the new TAP message format in the captured log:

msg: ok 1 - command 'PROXYSQL FLUSH LOGS;' - BASELINE: 58 - got 59 (>= BASELINE + 1)
msg: ok 2 - command 'kill -s SIGUSR1 $PID' - BASELINE: 59 - got 60 (>= BASELINE + 1)
msg: Test took 1 sec

Scope

Just this one test. The other three flaky tests listed in #5610 (pgsql-servers_ssl_params-t, pgsql-ssl_keylog-t, test_read_only_actions_offline_hard_servers-t) are C++ and each warrants its own investigation + PR — not worth batching into one "fix all flakes" mega-patch.

Test plan

bash -n test/tap/tests/test_flush_logs-t.sh — script syntax valid
Local stress test: 10/10 passes in legacy-g3 + TEST_PY_TAP_INCL=test_flush_logs-t
TAP output shows new message format, confirming polling code is executing
Wall time dropped from 9 s to 1 s (evidence that fixed sleeps are gone, not just more tolerant)
CI green on CI-legacy-g3 against this PR's HEAD
After merge, watch 5-10 subsequent runs of CI-legacy-g3 on v3.0 main — expect the test_flush_logs-t row to disappear from the flake list

Note on the `-t` vs `.sh` file duality

For anyone who finds this PR puzzling: test/tap/tests/test_flush_logs-t is NOT the same file as test_flush_logs-t.sh. The -t file is a copy produced by the sh-% rule in test/tap/tests/Makefile (literally cp test_flush_logs-t.sh test_flush_logs-t && chmod +x). The runner invokes *-t, not *.sh, and the -t files are gitignored. So the only tracked source of truth is test_flush_logs-t.sh, and this PR modifies only that file. The -t copy is regenerated by make tests-sh or the top-level TAP build. I hit this the hard way during local verification — documented here so the next person doesn't.

Summary by CodeRabbit

Tests
- Enhanced test infrastructure for log flushing operations with improved timing reliability and robustness
- Replaced fixed timing assumptions with dynamic polling mechanisms for more consistent test execution
- Added helper utilities for cleaner test setup and teardown procedures

Part of #5610 (flaky TAP tests revealed by cascade fix). Addresses the CI-legacy-g3 'test_flush_logs-t' flake that failed on some PR #5596 commits but passed on others despite identical code. ## Root cause The test measures "does command X cause proxysql.log to gain one new '[INFO] ProxySQL version' marker line". The old flow used fixed sleeps: BASELINE=$(fn_get_rotations) # (includes sleep 1) fn_padmin "PROXYSQL FLUSH LOGS;" RES=$(fn_get_rotations) # (sleep 1, then count) fn_check_res # strict equality check For the SIGUSR1 subtest the flow was worse because the signal had to travel through a "Scheduler Hack" (fn_signal inserts a scheduler row that runs `kill -USR1 $(pidof proxysql)` every 2000 ms, waits 4 s, then deletes the row). Total wall-clock budget from signal scheduling to log read: 5 s including the inner fn_get_rotations sleep. On a slow / contended CI runner any one of three asynchronous stages can slip past that 5 s window: 1. The scheduler's first interval fires slightly late. 2. The SIGUSR1 handler in proxysql queues the log rotation on a worker thread; the actual rotation + re-open + banner write happens after the signal is delivered, not synchronously. 3. The host-side read of proxysql.log via the docker volume bind-mount can lag the in-container write by up to a second on Azure-hosted runners under load. When any of those raced past 5 s, fn_get_rotations returned the pre-rotation count, fn_check_res saw `RES != BASELINE+1`, the test flagged a failure, and CI went red. Same code ran fine on the next retrigger because the race was environmental, not logical. A second, subtler issue: the strict `-ne` equality check in fn_check_res would ALSO fail if the scheduler fired twice before the script got around to deleting its row (interval_ms was 2000, sleep was 4 s - one extra fire was a coin flip on fast runners). One signal is the contract; two rotations from one signal is still the correct observable answer for "did this trigger at least one rotation". ## Fix Two changes, both in the .sh (the -t file is a gitignored copy regenerated by `make sh-test_flush_logs-t.sh`, which itself just `cp`s the .sh — same rule applies to every other `*-t.sh` test): 1. Replace fixed sleeps with bounded polling: - fn_get_rotations no longer has an internal `sleep 1` — it is now a pure "read the current count" function. - New fn_wait_for_rotation_at_least <target> <max_seconds> polls fn_get_rotations every 500 ms until the count reaches the target or the timeout expires. On success it prints the observed count and returns 0; on timeout it prints whatever it last saw and returns 1, so fn_check_res still reports a useful "BASELINE X expected Y got Z" error message. - Max budget raised to 30 s (was 5 s). On a fast machine the poll still terminates within one 500 ms cycle, so total test time on the local dev workstation dropped from ~9 s to ~1 s. 2. Accept "at least BASELINE+1" instead of "exactly BASELINE+1": - fn_check_res uses `-lt` instead of `-ne`. - Test semantics: "did the command trigger at least one rotation". Two rotations from one signal is still a pass (happens only in the scheduler-hack corner, when the scheduler fires twice before fn_signal_cleanup tears it down; harmless to proxysql behavior). - Also drops the scheduler interval from 2000 ms to 500 ms so the first fire happens quickly under normal conditions. 3. Split fn_signal into fn_signal (install only) and fn_signal_cleanup (tear down the scheduler row). The caller now: fn_signal SIGUSR1 RES=$(fn_wait_for_rotation_at_least $((BASELINE+1)) 30) fn_signal_cleanup fn_check_res ... which is race-free: the cleanup runs AFTER the rotation has been observed, so the scheduler row exists only for the bounded window between install and observation. ## Local verification Reproduced the original PR #5596 failure context (legacy-g3 group with only test_flush_logs-t enabled via TEST_PY_TAP_INCL), ran 10 back-to-back iterations against a single infra lifecycle following the recipe in test/README.md §"Debugging a flaky test": for i in 1..10: ./test/infra/control/run-tests-isolated.bash Results: - 10/10 pass - Per-attempt wall time dropped from ~9 s to ~1 s - Confirmed new code path active via the new "got N (>= BASELINE+1)" message format in the captured TAP output ## What's still TODO in #5610 The other three flaky tests from the issue are C++ and each needs its own targeted investigation. They are NOT touched here: - pgsql-servers_ssl_params-t - pgsql-ssl_keylog-t - test_read_only_actions_offline_hard_servers-t Deliberately keeping this PR small - one test per PR so each fix can be reviewed and validated independently.

coderabbitai · 2026-04-11T13:42:35Z

📝 Walkthrough

Walkthrough

This change refactors a test script to replace fixed timing delays with dynamic polling mechanisms. The scheduler interval for signal handling is reduced from 2000ms to 500ms, and new helper functions enable waiting for rotation count changes and cleaning up test state. Assertion criteria are relaxed from exact equality to allowing any count above baseline.

Changes

Cohort / File(s)	Summary
Test Robustness Improvements `test/tap/tests/test_flush_logs-t.sh`	Replaced fixed `sleep` statements with polling-based `fn_wait_for_rotation_at_least` function; reduced scheduler interval from 2000ms to 500ms; added `fn_signal_cleanup` helper for state cleanup; modified `fn_check_res` to accept rotation counts >= baseline instead of exact equality; adjusted test sections to use 30-second polling timeouts.

Possibly related issues

Flaky TAP tests: track and fix timing-sensitive tests revealed by cascade fix #5610 — This PR directly implements the hardening of test_flush_logs-t by replacing fixed sleeps with polling loops and adding helper functions to avoid timing races in rotation detection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 No more sleepy waits that race the clock,
With polling now, our tests stay on the block!
Five hundred mills, a quicker beat,
Rotation checks made safe and fleet! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: replacing fixed sleep timing with polling in the test_flush_logs test.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch v3.0-fix-flake-test-flush-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2026-04-11T13:43:19Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gemini-code-assist

Code Review

This pull request improves the reliability of the test_flush_logs-t.sh test by replacing fixed sleeps with a polling mechanism and relaxing strict equality checks to prevent flakiness on slow CI runners. It also introduces a dedicated cleanup function for the scheduler hack. Review feedback suggests optimizing database interactions by combining INSERT/DELETE and LOAD commands into single calls and ensuring robust numeric parsing by stripping whitespace from command output.

gemini-code-assist · 2026-04-11T13:44:08Z

+	fn_padmin "INSERT OR REPLACE INTO scheduler (id, active, interval_ms, filename, arg1, arg2) VALUES (9999, 1, 500, '/bin/sh', '-c', 'kill -${sig#SIG} \$(pidof proxysql)');"
 	fn_padmin "LOAD SCHEDULER TO RUNTIME;"


Combining the INSERT and LOAD statements into a single fn_padmin call is more efficient as it avoids the overhead of establishing multiple database connections and invoking the mysql client process twice.

Suggested change

fn_padmin "INSERT OR REPLACE INTO scheduler (id, active, interval_ms, filename, arg1, arg2) VALUES (9999, 1, 500, '/bin/sh', '-c', 'kill -${sig#SIG} \$(pidof proxysql)');"

fn_padmin "LOAD SCHEDULER TO RUNTIME;"

fn_padmin "INSERT OR REPLACE INTO scheduler (id, active, interval_ms, filename, arg1, arg2) VALUES (9999, 1, 500, '/bin/sh', '-c', 'kill -${sig#SIG} \$(pidof proxysql)'); LOAD SCHEDULER TO RUNTIME;"

gemini-code-assist · 2026-04-11T13:44:08Z

+	fn_padmin "DELETE FROM scheduler WHERE id=9999;" >/dev/null 2>&1 || true
+	fn_padmin "LOAD SCHEDULER TO RUNTIME;" >/dev/null 2>&1 || true


Similar to the fn_signal function, combining the DELETE and LOAD statements here reduces process and connection overhead during the test cleanup phase.

Suggested change

fn_padmin "DELETE FROM scheduler WHERE id=9999;" >/dev/null 2>&1 || true

fn_padmin "LOAD SCHEDULER TO RUNTIME;" >/dev/null 2>&1 || true

fn_padmin "DELETE FROM scheduler WHERE id=9999; LOAD SCHEDULER TO RUNTIME;" >/dev/null 2>&1 || true

gemini-code-assist · 2026-04-11T13:44:08Z

+	local count=0
+	local max_iterations=$(( max_seconds * 2 ))
+	while [ $i -lt $max_iterations ]; do
+		count=$(fn_get_rotations)


The output of wc -l (used inside fn_get_rotations) often contains leading or trailing whitespace depending on the platform's implementation. Stripping this whitespace ensures that the count variable contains a clean integer, which makes the subsequent numeric comparisons and TAP output more robust.

Suggested change

count=$(fn_get_rotations)

count=$(fn_get_rotations | tr -d '[:space:]')

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/tap/tests/test_flush_logs-t.sh`:
- Around line 92-98: The test installs a scheduler row id=9999 that can remain
active if the script exits or is interrupted; add a best-effort cleanup from
fn_exit so that the scheduler row and runtime are removed even on unexpected
exits. Modify or ensure the EXIT/INT/TERM trap calls fn_signal_cleanup (or add a
call to fn_signal_cleanup inside fn_exit) and make fn_signal_cleanup resilient
(silently ignore failures) when calling fn_padmin "DELETE FROM scheduler WHERE
id=9999;" and "LOAD SCHEDULER TO RUNTIME;" so the 500ms job won't continue
sending SIGUSR1 to ProxySQL after the test exits.
- Around line 135-149: The polling loop computes max_iterations as max_seconds *
2 which causes the last check to occur at max_seconds - 0.5, missing events that
happen right before the advertised timeout; update the max_iterations
calculation to allow one extra half-second iteration (e.g., add 1) so the final
fn_get_rotations check runs at max_seconds, ensuring the loop (variables
max_iterations, max_seconds, i and the fn_get_rotations call) will observe
rotations that land just before the timeout.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 267ebfd1-e1fd-48d4-b86b-395f410f585e

📥 Commits

Reviewing files that changed from the base of the PR and between 04466b3 and 5444125.

📒 Files selected for processing (1)

test/tap/tests/test_flush_logs-t.sh

📜 Review details

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

📚 Learning: 2026-04-01T21:27:03.216Z

Learnt from: wazir-ahmed
Repo: sysown/proxysql PR: 5557
File: test/tap/tests/unit/gtid_set_unit-t.cpp:14-17
Timestamp: 2026-04-01T21:27:03.216Z
Learning: In ProxySQL's unit test directory (test/tap/tests/unit/), test_globals.h and test_init.h are only required for tests that depend on the ProxySQL runtime globals/initialization (i.e., tests that exercise components linked against libproxysql.a). Pure data-structure or utility tests (e.g., ezoption_parser_unit-t.cpp, gtid_set_unit-t.cpp, gtid_trxid_interval_unit-t.cpp) only need tap.h and the relevant project header — omitting test_globals.h and test_init.h is correct and intentional in these cases.

Applied to files:

test/tap/tests/test_flush_logs-t.sh

📚 Learning: 2026-01-20T09:34:27.165Z

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

test/tap/tests/test_flush_logs-t.sh

📚 Learning: 2026-01-20T07:40:34.938Z

Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

Applied to files:

test/tap/tests/test_flush_logs-t.sh

🪛 Shellcheck (0.11.0)

test/tap/tests/test_flush_logs-t.sh

[info] 139-139: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 148-148: Double quote to prevent globbing and word splitting.

(SC2086)

[style] 153-153: $/${} is unnecessary on arithmetic variables.

(SC2004)

[info] 154-154: Double quote to prevent globbing and word splitting.

(SC2086)

[style] 160-160: $/${} is unnecessary on arithmetic variables.

(SC2004)

[style] 161-161: $/${} is unnecessary on arithmetic variables.

(SC2004)

[style] 162-162: $/${} is unnecessary on arithmetic variables.

(SC2004)

[style] 176-176: $/${} is unnecessary on arithmetic variables.

(SC2004)

[style] 182-182: $/${} is unnecessary on arithmetic variables.

(SC2004)

🔇 Additional comments (2)

test/tap/tests/test_flush_logs-t.sh (2)

77-90: Good call moving teardown out of fn_signal.

Keeping the scheduler row alive until the effect is observed removes the original “delete before first fire” race.

155-165: >= BASELINE + 1 matches the new test semantics.

Allowing extra rotations here is the right trade-off once the scheduler path can legitimately fire more than once before cleanup.

coderabbitai · 2026-04-11T13:46:37Z

+fn_signal_cleanup () {
+	# Tear down any scheduler row installed by fn_signal. Safe to call
+	# unconditionally - if fn_signal used the docker path instead of the
+	# scheduler hack, this is a no-op on a non-existent row.
+	fn_padmin "DELETE FROM scheduler WHERE id=9999;" >/dev/null 2>&1 || true
+	fn_padmin "LOAD SCHEDULER TO RUNTIME;" >/dev/null 2>&1 || true
 }


⚠️ Potential issue | 🟠 Major

Add a best-effort cleanup from fn_exit.

If the script exits or is interrupted after installing scheduler row 9999 but before the explicit cleanup call, that 500 ms job keeps sending SIGUSR1 to the live ProxySQL instance. This can bleed into later TAP tests and make failures non-local.

Suggested guardrail

fn_exit () { trap - EXIT trap - SIGINT + fn_signal_cleanup >/dev/null 2>&1 || true if [[ $DONE -eq $PLAN ]] && [[ $FAIL -eq 0 ]]; then echo "msg: Test took $SECONDS sec" exit 0

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/test_flush_logs-t.sh` around lines 92 - 98, The test installs a scheduler row id=9999 that can remain active if the script exits or is interrupted; add a best-effort cleanup from fn_exit so that the scheduler row and runtime are removed even on unexpected exits. Modify or ensure the EXIT/INT/TERM trap calls fn_signal_cleanup (or add a call to fn_signal_cleanup inside fn_exit) and make fn_signal_cleanup resilient (silently ignore failures) when calling fn_padmin "DELETE FROM scheduler WHERE id=9999;" and "LOAD SCHEDULER TO RUNTIME;" so the 500ms job won't continue sending SIGUSR1 to ProxySQL after the test exits.

coderabbitai · 2026-04-11T13:46:37Z

+	local max_iterations=$(( max_seconds * 2 ))
+	while [ $i -lt $max_iterations ]; do
+		count=$(fn_get_rotations)
+		if [ "$count" -ge "$target" ]; then
+			echo $count
+			return 0
+		fi
+		sleep 0.5
+		i=$(( i + 1 ))
+	done
+	# Timeout. Emit whatever we last observed so fn_check_res will
+	# report a useful "BASELINE X expected Y got Z" message rather than
+	# an empty RES.
+	echo $count
+	return 1


⚠️ Potential issue | 🟡 Minor

The polling loop misses the last eligible sample.

With max_iterations=$(( max_seconds * 2 )), the final check happens at max_seconds - 0.5, not at max_seconds. A rotation that lands just before the advertised timeout still fails, so the flake window is smaller but not fully closed.

Minimal fix

- local max_iterations=$(( max_seconds * 2 )) + local max_iterations=$(( max_seconds * 2 + 1 )) while [ $i -lt $max_iterations ]; do

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

local max_iterations=$(( max_seconds * 2 ))

while [ $i -lt $max_iterations ]; do

count=$(fn_get_rotations)

if [ "$count" -ge "$target" ]; then

echo $count

return 0

fi

sleep 0.5

i=$(( i + 1 ))

done

# Timeout. Emit whatever we last observed so fn_check_res will

# report a useful "BASELINE X expected Y got Z" message rather than

# an empty RES.

echo $count

return 1

local max_iterations=$(( max_seconds * 2 + 1 ))

while [ $i -lt $max_iterations ]; do

count=$(fn_get_rotations)

if [ "$count" -ge "$target" ]; then

echo $count

return 0

fi

sleep 0.5

i=$(( i + 1 ))

done

# Timeout. Emit whatever we last observed so fn_check_res will

# report a useful "BASELINE X expected Y got Z" message rather than

# an empty RES.

echo $count

return 1

🧰 Tools

🪛 Shellcheck (0.11.0)

[info] 139-139: Double quote to prevent globbing and word splitting.

(SC2086)

[info] 148-148: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/tap/tests/test_flush_logs-t.sh` around lines 135 - 149, The polling loop computes max_iterations as max_seconds * 2 which causes the last check to occur at max_seconds - 0.5, missing events that happen right before the advertised timeout; update the max_iterations calculation to allow one extra half-second iteration (e.g., add 1) so the final fn_get_rotations check runs at max_seconds, ensuring the loop (variables max_iterations, max_seconds, i and the fn_get_rotations call) will observe rotations that land just before the timeout.

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

This was referenced Apr 11, 2026

test(read_only_offline_hard): set default_hostgroup to WHG, not hardcoded 0 #5613

Merged

fix(pgsql-monitor): clamp next_*_at on LOAD PGSQL VARIABLES TO RUNTIME #5614

Merged

renecannao merged commit caa2129 into v3.0 Apr 11, 2026
13 of 25 checks passed

coderabbitai bot mentioned this pull request Apr 12, 2026

fix(tests): resolve TAP test failures in multi-group runs #5619

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(flush_logs): harden timing race — poll instead of fixed sleep#5611

test(flush_logs): harden timing race — poll instead of fixed sleep#5611
renecannao merged 1 commit intov3.0from
v3.0-fix-flake-test-flush-logs

renecannao commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related issues

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sonarqubecloud bot commented Apr 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		fn_padmin "INSERT OR REPLACE INTO scheduler (id, active, interval_ms, filename, arg1, arg2) VALUES (9999, 1, 500, '/bin/sh', '-c', 'kill -${sig#SIG} \$(pidof proxysql)');"
		fn_padmin "LOAD SCHEDULER TO RUNTIME;"

		fn_padmin "DELETE FROM scheduler WHERE id=9999;" >/dev/null 2>&1 \|\| true
		fn_padmin "LOAD SCHEDULER TO RUNTIME;" >/dev/null 2>&1 \|\| true

	count=$(fn_get_rotations)
	count=$(fn_get_rotations \| tr -d '[:space:]')

Conversation

renecannao commented Apr 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Local verification (dogfooding the new flake-debug recipe from #5609)

Scope

Test plan

Note on the -t vs .sh file duality

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related issues

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

sonarqubecloud bot commented Apr 11, 2026

Quality Gate passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renecannao commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

Note on the `-t` vs `.sh` file duality

coderabbitai bot commented Apr 11, 2026 •

edited

Loading