Skip to content

fix(pgsql-monitor): clamp next_*_at on LOAD PGSQL VARIABLES TO RUNTIME#5614

Merged
renecannao merged 2 commits intov3.0from
v3.0-pgsql-monitor-reschedule-on-interval-change
Apr 12, 2026
Merged

fix(pgsql-monitor): clamp next_*_at on LOAD PGSQL VARIABLES TO RUNTIME#5614
renecannao merged 2 commits intov3.0from
v3.0-pgsql-monitor-reschedule-on-interval-change

Conversation

@renecannao
Copy link
Copy Markdown
Contributor

@renecannao renecannao commented Apr 11, 2026

Closes the pgsql-servers_ssl_params-t line item on #5610. Fourth and final of the four tests from that issue.

This PR is code + test together because the scheduler fix alone wouldn't make the test pass, and the test fix alone is useless without the scheduler fix. Reviewable in two distinct blocks below.


1 / 2 — lib/PgSQL_Monitor.cpp: scheduler clamp on interval change

Problem

pgsql-servers_ssl_params-t subtests 32 and 34 expect PgSQL_Monitor_ssl_connections_OK to increase a few seconds after:

SET pgsql-monitor_connect_interval=2000;
LOAD PGSQL VARIABLES TO RUNTIME;
UPDATE pgsql_servers SET use_ssl=1;
LOAD PGSQL SERVERS TO RUNTIME;

but the counter stays at 0 for ~2 minutes. Root cause is state caching in PgSQL_monitor_scheduler_thread():

t event
T=0 proxysql starts. default connect_interval=120000. Scheduler's first tick schedules initial connect check; compute_next_intvs() sets next_connect_at = T + 120000 ms.
T+~30s test does SET connect_interval=2000; LOAD PGSQL VARIABLES TO RUNTIME;. fetch_updated_conf() now returns interval=2000 on next iteration, BUT next_connect_at still points at T+120s.
T+~35s test reads ssl_connections_OK — still 0 because the next scheduled connect check is ~85 seconds away.

compute_next_intvs() only advances next_<type>_at when the type has just fired. It never recomputes in response to a variable change, so shortening an interval at runtime has zero effect until the already-scheduled (long) cycle naturally elapses.

Fix

In the scheduler loop, after fetch_updated_conf() when runtime variables were just refreshed (tracked by comparing PgSQL_Thread__variables_version against the global version), clamp each next_<type>_at down to cur_intv_start + new_interval when that would schedule the next check sooner than the currently-cached value:

if (vars_refreshed) {
    if (tasks_conf.ping.params.interval > 0) {
        const uint64_t clamped = cur_intv_start + tasks_conf.ping.params.interval;
        if (next_intvs.next_ping_at > clamped) {
            next_intvs.next_ping_at = clamped;
        }
    }
    // ... same for connect, readonly, repl_lag
}

Properties:

  • One-way: the clamp only shrinks next_<type>_at, never pushes it out. Growing the interval at runtime does not delay an already-imminent check.
  • Idempotent: if the interval didn't change, the clamp is a no-op.
  • Interval=0 safe: disabled intervals (interval == 0) are skipped and continue to be handled by compute_next_intvs() which sets them to ULONG_MAX.
  • Applied to all 4 task types (ping, connect, readonly, repl_lag) — the same class of bug affected all of them; fixing only connect would leave latent issues for tests that change ping_interval, read_only_interval, or replication_lag_interval at runtime.

Raw verification (before the test fix)

Direct experiment against the patched binary, no test file involved:

# fresh proxysql, default interval=120000
SET pgsql-monitor_connect_interval=2000;
LOAD PGSQL VARIABLES TO RUNTIME;
UPDATE pgsql_servers SET use_ssl=1;
LOAD PGSQL SERVERS TO RUNTIME;

Before the fix: ssl_connections_OK stuck at 0 for ~2 minutes.
After the fix: ssl_connections_OK goes 0 → 2 → 6 over 2 seconds, matching the 2000 ms interval × 2 monitor threads.


2 / 2 — test/tap/tests/pgsql-servers_ssl_params-t.cpp: drop hardcoded postgres/postgres credentials

Problem

With only the scheduler fix applied, the test was still failing because its main() did:

exec_ok(a, "SET pgsql-monitor_username='postgres'");
exec_ok(a, "SET pgsql-monitor_password='postgres'");

on the assumption that the backend had a postgres user with password "postgres". But the actual CI infra (test/infra/docker-pgsql16-single) randomizes POSTGRES_PASSWORD on every container startup:

$ docker exec $BACKEND env | grep PASS
POSTGRES_PASSWORD=05e792e51d

So the hardcoded "postgres" password never authenticated, and every monitor connect failed with:

PgSQL_Monitor.cpp:803: [ERROR] Monitor connect failed
  addr='pgsql1.docker-pgsql16-single:5432'
  error='FATAL: password authentication failed for user "postgres"'

These auth failures incremented connect_check_ERR instead of ssl_connections_OK. After the scheduler fix, the auth failures fire every 2 seconds (proof the scheduler fix works!) but they're still failures, so the counter the test is checking never advances.

Fix

Remove the monitor username/password switch entirely. The default monitor/monitor user is already configured in the infra's pg_hba.conf and authenticates successfully (verified manually via docker exec psql 'host=... user=monitor password=monitor sslmode=require' from both inside and outside the proxysql container). Also remove the paired "restore original values" block since there's nothing to restore.


Local verification (both fixes together)

3 consecutive iterations of the full test in fresh legacy-g4 infra using the test/README.md §"Debugging a flaky test" recipe:

attempt 1: PASS
attempt 2: PASS
attempt 3: PASS
=== pgsql-servers_ssl_params-t: 3/3 pass ===

Subtest-level confirmation from the final attempt's TAP log:

# Original monitor: user=monitor interval=120000 ms
# Initial PgSQL_Monitor_ssl_connections_OK: 33
# After PgSQL_Monitor_ssl_connections_OK: 36           ← +3 in 5 sec
ok 32 - Monitor SSL counter increased with use_ssl=1 and no per-server row

# With TLSv1 per-server pin, ssl OK before wait: 39
# With TLSv1 per-server pin, ssl OK after wait:  39 (delta=0)
ok 33 - Monitor per-server: SSL OK counter does NOT advance when per-server
        row pins ssl_protocol_version_range to TLSv1

# After cleanup, ssl OK recovered from 41 to 44        ← +3 in 5 sec
ok 34 - Monitor per-server: SSL OK counter resumes advancing after removing
        the per-server row

All three monitor-SSL subtests now exercise the real code path (SSL handshake, counter incrementing, per-server pin blocking SSL as designed). Pre-fix the counters were observing a no-op and the test was deterministically failing.

Side effect on PR #5612

Subtest 7 of pgsql-ssl_keylog-t was marked as SKIP in #5612 because it tripped on the same "pgsql monitor isn't making SSL connections" symptom. With this PR merged, the skip's runtime condition (lines_before_monitor == lines_after_monitor) evaluates to false once the monitor is actually producing SSL handshakes, and the test falls into the real ok(...) branch automatically. No separate change to pgsql-ssl_keylog-t is needed — the skip was defensive and becomes dead code after this fix.

Scope

  • Scoped to the pgsql monitor only (lib/PgSQL_Monitor.cpp). The mysql monitor (lib/MySQL_Monitor.cpp) has a different scheduling architecture (per-thread timers, not a centralized scheduler) and may or may not have the same class of bug — out of scope for this PR. Anyone investigating similar symptoms on the mysql side should use this fix as a template but verify the scheduling model before porting.
  • Does not touch any other test. The credential fix is localized to pgsql-servers_ssl_params-t.cpp.

Test plan

  • Raw experiment proving scheduler fix works independently of the test: counter advances at 2000 ms cadence after SET + LOAD
  • Full test run 3/3 passing in fresh infra with both fixes applied
  • TAP output confirms subtests 32, 33, 34 all exercise the real SSL handshake path (not fallback to plaintext or no-op)
  • Sanity: subtest 33 (TLSv1 pin blocks SSL) still correctly shows delta=0 — the fix does not accidentally make failing cases "succeed"
  • CI green on CI-legacy-g4 against this PR's HEAD
  • Side-check: CI-legacy-g4 run should ALSO show pgsql-ssl_keylog-t subtest 7 passing (no longer hitting its skip fallback), confirming the side-effect on PR test(pgsql-ssl_keylog): fix container-local path + NSS label regex #5612 is real

Closes

Summary by CodeRabbit

  • Bug Fixes
    • Monitoring now applies reduced PostgreSQL check intervals immediately during the current cycle when runtime scheduling is refreshed, improving responsiveness to configuration changes. Debug logging for these immediate adjustments has been added.

Closes the pgsql-servers_ssl_params-t portion of #5610.
This is the final of four tests from the flake tracking issue.

## Problem 1 — the scheduler caching bug (lib/PgSQL_Monitor.cpp)

`pgsql-servers_ssl_params-t` subtests 32 ("Monitor SSL counter
increased with use_ssl=1 and no per-server row") and 34 ("Monitor
per-server: SSL OK counter resumes advancing after removing the
per-server row") were failing deterministically on v3.0. Both expected
`PgSQL_Monitor_ssl_connections_OK` to increase within a few seconds of:

  SET pgsql-monitor_connect_interval=2000;
  LOAD PGSQL VARIABLES TO RUNTIME;
  UPDATE pgsql_servers SET use_ssl=1;
  LOAD PGSQL SERVERS TO RUNTIME;

but the counter stayed at 0. Root cause is state caching in the pgsql
monitor scheduler loop (`PgSQL_monitor_scheduler_thread()`):

  T=0       proxysql starts with default pgsql-monitor_connect_interval
            =120000. The first scheduler tick schedules an initial
            connect check; compute_next_intvs() sets next_connect_at
            = T + 120000 ms = T + 120 seconds.
  T+<30s>   test does SET connect_interval=2000 + LOAD PGSQL VARIABLES
            TO RUNTIME.  fetch_updated_conf() starts returning the new
            2000 value, but next_connect_at still points at T+120000
            because compute_next_intvs() only updates next_<type>_at
            when the corresponding task type has fired.
  T+<35s>   test reads ssl_connections_OK at the end of its 5-second
            wait.  Counter still 0 because the next scheduled connect
            check is ~85 seconds in the future.

The scheduler is working correctly; what's missing is a bridge between
"runtime variables were just refreshed" and "next_<type>_at should
reflect the refreshed (shorter) intervals".

Fix: in the scheduler loop, track whether the variable version bumped
this iteration, and if so, clamp each next_<type>_at down to
cur_intv_start + new_interval whenever the refreshed interval would
schedule the next check sooner than the currently-cached value. The
clamp is one-way (we never push next_<type>_at FURTHER into the
future, because growing the interval should not delay an already-
imminent check), idempotent, and safe against interval=0 (disabled)
which is left to compute_next_intvs() to set to ULONG_MAX.

Applied to all four monitor task types: ping, connect, readonly,
repl_lag. Same class of bug affected all of them; fixing only connect
would leave analogous latent issues for any test that changes the
other intervals at runtime.

Verified with the raw experiment before committing: configure
monitor/monitor + interval=2000 + use_ssl=1 + LOAD, observe counter
ticking at the 2-second cadence within 1-2 seconds of LOAD. Before
fix: counter stuck at 0 for ~2 minutes (until the cached 120 s
interval naturally elapses).

## Problem 2 — test's hardcoded wrong credentials (pgsql-servers_ssl_params-t.cpp)

With the scheduler fix alone, the test was still failing because its
main() was doing:

  SET pgsql-monitor_username='postgres';
  SET pgsql-monitor_password='postgres';

on the assumption that the backend had a postgres user with password
"postgres". But the actual test infra (test/infra/docker-pgsql16-single)
RANDOMIZES `POSTGRES_PASSWORD` on every container startup — e.g.
`POSTGRES_PASSWORD=05e792e51d`. Hardcoded 'postgres' never matched,
so every monitor connect failed with:

  FATAL: password authentication failed for user "postgres"

which increments connect_check_ERR instead of ssl_connections_OK.
After the scheduler fix, these auth failures fired every 2 seconds
instead of every 120 seconds — but they were still failures, so the
counter never advanced.

Fix: remove the username/password switch entirely. The default
monitor/monitor user is already configured in the infra's pg_hba.conf
and authenticates successfully (I verified this manually via
`docker exec psql 'host=... user=monitor password=monitor sslmode=require'`
from both inside and outside the proxysql container). The
corresponding "restore original values" block is also removed since
there's nothing to restore.

## Local verification

3 consecutive runs of the full pgsql-servers_ssl_params-t test in
legacy-g4 infra with the patched proxysql + patched test binary:

  attempt 1: PASS
  attempt 2: PASS
  attempt 3: PASS
  === pgsql-servers_ssl_params-t: 3/3 pass ===

Subtest-level confirmation from the final attempt's TAP log:

  # Original monitor: user=monitor interval=120000 ms
  # Initial PgSQL_Monitor_ssl_connections_OK: 33
  # After PgSQL_Monitor_ssl_connections_OK: 36      <- +3 in 5 s
  ok 32 - Monitor SSL counter increased with use_ssl=1 and no per-server row
  # With TLSv1 per-server pin, ssl OK before wait: 39
  # With TLSv1 per-server pin, ssl OK after wait:  39 (delta=0)
  ok 33 - Monitor per-server: SSL OK counter does NOT advance when
          per-server row pins ssl_protocol_version_range to TLSv1
  # After cleanup, ssl OK recovered from 41 to 44    <- +3 in 5 s
  ok 34 - Monitor per-server: SSL OK counter resumes advancing after
          removing the per-server row

All three monitor-SSL subtests now exercise the real code path (SSL
handshake happening, counter incrementing, per-server pin blocking
SSL as designed) instead of observing a no-op.

## Side effect on pgsql-ssl_keylog-t subtest 7

Subtest 7 of pgsql-ssl_keylog-t was marked as SKIP in PR #5612 because
it tripped on the same "pgsql monitor isn't making SSL connections"
symptom. With this fix merged, the skip's runtime condition
(`lines_before_monitor == lines_after_monitor`) will evaluate to false
once the monitor is actually producing SSL handshakes, and the test
will fall into the `ok(...)` branch automatically. No separate change
to that test is needed — the skip was defensive and is dead code
after this fix.

## Why mysql monitor is not touched here

This fix is scoped to the pgsql monitor scheduler only. The mysql
monitor is a different file (lib/MySQL_Monitor.cpp) with a different
scheduling architecture (per-thread timers, not a centralized
scheduler). If the same class of bug exists there, it would need a
separate patch - out of scope for this PR.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 48eac796-fdae-4ea8-9343-7bbc658c3ada

📥 Commits

Reviewing files that changed from the base of the PR and between 30be016 and 4bd2f95.

📒 Files selected for processing (2)
  • lib/PgSQL_Monitor.cpp
  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
  • lib/PgSQL_Monitor.cpp

📝 Walkthrough

Walkthrough

Scheduler now clamps next_*_at timestamps when runtime PgSQL thread variables are refreshed so reduced intervals take effect immediately; test setup was simplified to stop changing monitor credentials and only restore the connect interval after tests.

Changes

Cohort / File(s) Summary
Scheduler Interval Clamping
lib/PgSQL_Monitor.cpp
Added vars_refreshed tracking and clamping logic in PgSQL_monitor_scheduler_thread(). After fetch_updated_conf(...), for refreshed runtime thread vars the code bounds next_ping_at, next_connect_at, next_readonly_at, and next_repl_lag_at to cur_intv_start + new_interval when the new interval > 0 and the existing next_*_at is later; debug logging added.
Test Monitor Configuration
test/tap/tests/pgsql-servers_ssl_params-t.cpp
Removed logic that read/set pgsql-monitor_password and reset username/password after tests (CI randomizes POSTGRES_PASSWORD). Now only reads pgsql-monitor_username for logging and restores pgsql-monitor_connect_interval post-test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibbled at timestamps in the night,

clamped their stretches, set them right.
Intervals shortened—no more delay,
the scheduler hops and leads the way. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the main fix: clamping next_*_at values on runtime variable reload, which is the primary code change addressing a pgsql monitor scheduling bug.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch v3.0-pgsql-monitor-reschedule-on-interval-change

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to ensure that reductions in PostgreSQL monitor intervals take effect immediately by clamping the next scheduled execution times. It also updates the pgsql-servers_ssl_params-t.cpp test to use default credentials, avoiding failures in CI environments where passwords are randomized. Feedback indicates that the refresh check should be moved earlier in the scheduler loop to ensure prompt detection of configuration changes. Furthermore, the clamping logic should explicitly handle disabled tasks (interval set to 0) to prevent them from continuing to run under old schedules or entering high-frequency execution loops due to existing bugs in the interval calculation logic.

Comment thread lib/PgSQL_Monitor.cpp
Comment on lines 2537 to 2541
if (PgSQL_Thread__variables_version < glover) {
PgSQL_Thread__variables_version = glover;
pgsql_thread->refresh_variables();
vars_refreshed = true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable refresh logic is currently placed inside the if (cur_intv_start >= closest_intv) block. This means that if all monitoring tasks have long intervals (e.g., several minutes), a change to a shorter interval via LOAD PGSQL VARIABLES TO RUNTIME will not be detected until the shortest of the old intervals expires.

To ensure that interval changes are picked up promptly (within the scheduler's 500ms sleep cycle), this refresh check and the subsequent clamping logic should be moved outside and before the closest_intv calculation (around line 2511).

Comment thread lib/PgSQL_Monitor.cpp
Comment on lines +2574 to +2599
if (vars_refreshed) {
if (tasks_conf.ping.params.interval > 0) {
const uint64_t clamped = cur_intv_start + tasks_conf.ping.params.interval;
if (next_intvs.next_ping_at > clamped) {
next_intvs.next_ping_at = clamped;
}
}
if (tasks_conf.connect.params.interval > 0) {
const uint64_t clamped = cur_intv_start + tasks_conf.connect.params.interval;
if (next_intvs.next_connect_at > clamped) {
next_intvs.next_connect_at = clamped;
}
}
if (tasks_conf.readonly.params.interval > 0) {
const uint64_t clamped = cur_intv_start + tasks_conf.readonly.params.interval;
if (next_intvs.next_readonly_at > clamped) {
next_intvs.next_readonly_at = clamped;
}
}
if (tasks_conf.repl_lag.params.interval > 0) {
const uint64_t clamped = cur_intv_start + tasks_conf.repl_lag.params.interval;
if (next_intvs.next_repl_lag_at > clamped) {
next_intvs.next_repl_lag_at = clamped;
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The clamping logic currently only handles cases where the new interval is greater than zero. If an interval is set to 0 (disabled) at runtime, the task will continue to fire at its old interval until it expires. Furthermore, due to a bug in compute_next_intvs (where the else block at line 2449 is unreachable), it may even continue to fire every loop iteration once the old interval expires.

It is recommended to explicitly handle the interval == 0 case here by setting the next execution time to ULONG_MAX to stop the task immediately. This also makes the claim in the comment at lines 2571-2573 accurate.

References
  1. Ensure that runtime configuration changes, especially those disabling features, take effect immediately and do not leave the system in an inconsistent or high-load state.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/tap/tests/pgsql-servers_ssl_params-t.cpp (1)

663-668: Consider removing unused original_monitor_username.

The variable original_monitor_username is read and logged via diag() but never used for restoration since credentials are no longer modified. While it provides useful diagnostic context, it could be removed to simplify the code if desired.

This is a minor observation and the current code is perfectly acceptable for diagnostic purposes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/pgsql-servers_ssl_params-t.cpp` around lines 663 - 668, The
test declares original_monitor_username via exec_scalar and passes it to diag
but never uses it for restoration; remove the unused variable and its retrieval
call (original_monitor_username and the exec_scalar(...) call) and update the
diag call in the pgsql-servers_ssl_params-t.cpp test (remove the username %s
placeholder and argument) or remove the entire diag line so only
original_connect_interval from getConnectInterval(a) is logged; reference
symbols: original_monitor_username, exec_scalar, diag, getConnectInterval.
lib/PgSQL_Monitor.cpp (1)

2574-2599: Consider extracting the repetitive clamping pattern.

The clamping logic is correct and accomplishes the one-way min() semantics as described. The implementation is clear but repetitive across all four task types.

A helper lambda or small function could reduce duplication:

auto clamp_next = [&](uint64_t& next_at, int32_t interval) {
    if (interval > 0) {
        const uint64_t clamped = cur_intv_start + interval;
        if (next_at > clamped) {
            next_at = clamped;
        }
    }
};
clamp_next(next_intvs.next_ping_at, tasks_conf.ping.params.interval);
clamp_next(next_intvs.next_connect_at, tasks_conf.connect.params.interval);
clamp_next(next_intvs.next_readonly_at, tasks_conf.readonly.params.interval);
clamp_next(next_intvs.next_repl_lag_at, tasks_conf.repl_lag.params.interval);

That said, the current explicit form is easy to understand and maintain, so this is purely a stylistic suggestion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/PgSQL_Monitor.cpp` around lines 2574 - 2599, Extract the repeated
clamping into a small helper (lambda or static inline function) and call it from
inside the vars_refreshed block: create e.g. auto clamp_next = [&](uint64_t&
next_at, int32_t interval){ if (interval>0){ const uint64_t clamped =
cur_intv_start + interval; if (next_at > clamped) next_at = clamped; }} and
replace the four duplicated blocks with clamp_next(next_intvs.next_ping_at,
tasks_conf.ping.params.interval); clamp_next(next_intvs.next_connect_at,
tasks_conf.connect.params.interval); clamp_next(next_intvs.next_readonly_at,
tasks_conf.readonly.params.interval); clamp_next(next_intvs.next_repl_lag_at,
tasks_conf.repl_lag.params.interval); ensure the helper captures cur_intv_start
or accepts it as a parameter as needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/PgSQL_Monitor.cpp`:
- Around line 2574-2599: Extract the repeated clamping into a small helper
(lambda or static inline function) and call it from inside the vars_refreshed
block: create e.g. auto clamp_next = [&](uint64_t& next_at, int32_t interval){
if (interval>0){ const uint64_t clamped = cur_intv_start + interval; if (next_at
> clamped) next_at = clamped; }} and replace the four duplicated blocks with
clamp_next(next_intvs.next_ping_at, tasks_conf.ping.params.interval);
clamp_next(next_intvs.next_connect_at, tasks_conf.connect.params.interval);
clamp_next(next_intvs.next_readonly_at, tasks_conf.readonly.params.interval);
clamp_next(next_intvs.next_repl_lag_at, tasks_conf.repl_lag.params.interval);
ensure the helper captures cur_intv_start or accepts it as a parameter as
needed.

In `@test/tap/tests/pgsql-servers_ssl_params-t.cpp`:
- Around line 663-668: The test declares original_monitor_username via
exec_scalar and passes it to diag but never uses it for restoration; remove the
unused variable and its retrieval call (original_monitor_username and the
exec_scalar(...) call) and update the diag call in the
pgsql-servers_ssl_params-t.cpp test (remove the username %s placeholder and
argument) or remove the entire diag line so only original_connect_interval from
getConnectInterval(a) is logged; reference symbols: original_monitor_username,
exec_scalar, diag, getConnectInterval.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 94b108d0-5d28-4b6d-9bc5-4c2276c361aa

📥 Commits

Reviewing files that changed from the base of the PR and between 04466b3 and 30be016.

📒 Files selected for processing (2)
  • lib/PgSQL_Monitor.cpp
  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
📜 Review details
🧰 Additional context used
📓 Path-based instructions (5)
lib/**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

lib/**/*.{cpp,h,hpp}: Write class names in PascalCase with protocol prefixes (MySQL_, PgSQL_, ProxySQL_)
Use snake_case for member variables
Require C++17 for all code
Use conditional compilation via #ifdef PROXYSQLGENAI, #ifdef PROXYSQL31, and similar feature flags for feature tiers
Use RAII for resource management
Use jemalloc for memory allocation
Use Pthread mutexes for synchronization
Use std::atomic<> for counters

Files:

  • lib/PgSQL_Monitor.cpp
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Use UPPER_SNAKE_CASE for constants and macros

Files:

  • lib/PgSQL_Monitor.cpp
  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
lib/**/*.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

One class per file typically in the library

Files:

  • lib/PgSQL_Monitor.cpp
test/tap/tests/**/*(test_*.cpp|*-t.cpp)

📄 CodeRabbit inference engine (CLAUDE.md)

Test files follow the naming pattern test_*.cpp or *-t.cpp

Files:

  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
test/tap/**/*.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

Use TAP (Test Anything Protocol) for tests with Docker-based backend infrastructure

Files:

  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
🧠 Learnings (2)
📚 Learning: 2026-04-11T05:43:20.598Z
Learnt from: CR
Repo: sysown/proxysql PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-11T05:43:20.598Z
Learning: Applies to test/tap/**/*.cpp : Use TAP (Test Anything Protocol) for tests with Docker-based backend infrastructure

Applied to files:

  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.

Applied to files:

  • test/tap/tests/pgsql-servers_ssl_params-t.cpp
🔇 Additional comments (4)
lib/PgSQL_Monitor.cpp (2)

2536-2540: LGTM! Clean flag tracking for variable refresh.

The vars_refreshed flag correctly captures whether runtime variables were reloaded, enabling the subsequent clamping logic to fire only when needed.


2546-2573: Excellent documentation of the subtle scheduling behavior.

The comment thoroughly explains the root cause (cached next_<type>_at values not being recomputed until the old cycle fires) and the fix rationale. The worked example with timestamps makes the problem concrete and easy to understand for future maintainers.

test/tap/tests/pgsql-servers_ssl_params-t.cpp (2)

651-662: Good documentation of the CI credential issue.

The comment clearly explains why the test no longer manipulates monitor credentials - the CI infrastructure randomizes POSTGRES_PASSWORD, making hardcoded credentials unreliable. This prevents future maintainers from reintroducing the same bug.


677-680: LGTM! Restoration scope correctly narrowed.

The restoration logic now only restores pgsql-monitor_connect_interval since that's the only variable the test modifies. This aligns with the decision not to touch credentials.

- Add proxy_debug(PROXY_DEBUG_MONITOR, 5) calls when clamping fires,
  so the fix is observable in debug logs. Follows existing pattern
  from nearby scheduling debug output.
- Rename original_monitor_username to monitor_username and clarify it
  is for diagnostic logging only, since the variable is no longer used
  for restore after the credential change was removed.
@sonarqubecloud
Copy link
Copy Markdown

@renecannao renecannao merged commit f7ee5cd into v3.0 Apr 12, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant