feat: maintenance flag to pause queue worker during deploys#66
feat: maintenance flag to pause queue worker during deploys#66JohnRDOrazio merged 10 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR adds a maintenance-mode system: a new ChangesMaintenance-Mode Feature
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions<br/>(Deploy Workflow)
participant WP as WordPress REST<br/>(maintenance endpoint)
participant Redis as Redis Server
participant Worker as Queue Worker<br/>(bash loop)
GHA->>WP: POST /cdcf/v1/maintenance<br/>{action:"begin", duration_seconds:300}
WP->>Redis: SETEX cdcf:maintenance:until TTL=300
Redis-->>WP: OK
WP-->>GHA: {ok, until, duration}
Note over GHA,Worker: Extract tarball, activate plugins
Worker->>Redis: EXISTS cdcf:maintenance:until (each cycle)
Redis-->>Worker: 1
Worker->>Worker: log "Entering maintenance mode" (once)\n sleep POLL_INTERVAL\n continue (skip processing)
GHA->>WP: POST /cdcf/v1/maintenance<br/>{action:"end"} (if: always)
WP->>Redis: DEL cdcf:maintenance:until
Redis-->>WP: OK
WP-->>GHA: {ok}
Worker->>Redis: EXISTS cdcf:maintenance:until (next cycle)
Redis-->>Worker: 0
Worker->>Worker: log "Exiting maintenance mode" (once)\n resume normal processing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 50 minutes and 26 seconds.Comment |
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 4 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
Sets/clears cdcf:maintenance:until in Redis with a TTL clamped to [60, 600] seconds. The cdcf-queue-worker will check this key at the top of each poll cycle to pause processing during deploys (#62). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an in_maintenance() helper that checks Redis at the top of each poll cycle. Logs a single transition line on entry and exit, not per cycle. Treats redis-cli failures as "not in maintenance" so a Redis outage doesn't stall the worker (#62). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Calls POST /cdcf/v1/maintenance{begin,300} before tarball extraction
and {end} after plugin activation (with if: always()). If begin fails,
abort the deploy — the whole point is to mitigate FPM stress, so
deploying without the pause defeats the purpose. End failures emit a
warning; the Redis TTL self-heals within 600s (#62).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps POST /cdcf/v1/maintenance for ergonomic admin use: scripts/.venv/bin/python scripts/cdcf_api.py maintenance --action begin --duration 300 scripts/.venv/bin/python scripts/cdcf_api.py maintenance --action end Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses code-review feedback: surface that argparse handles action validation at the CLI layer (so the ValueError is for programmatic callers), and add the missing # -- Maintenance Flag -- section comment that every other method group in CdcfClient has. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the new POST /cdcf/v1/maintenance endpoint, its redis-cli fallback, the Python CLI subcommand, and the expected worker log output during a deploy (#62). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses doc-review feedback on the new /maintenance row: - Note that this endpoint requires manage_options (administrator), not the table preamble's default edit_posts. An editor's Application Password would get 403 with no obvious explanation. - Clarify that the 60-600 range is server-side clamping, not a client-side validation requirement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
641482b to
432deff
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Line 88: The table row for the POST /maintenance entry is splitting into four
columns because of the unescaped pipe in begin|end—edit the row so the action
values are inside inline code (e.g., `begin`/`end`) or otherwise escape the pipe
so it renders as a single cell, and specify the capability as `manage_options`
(or similar admin wording) instead of implying it uses `edit_posts`; also update
the section intro that currently states all cdcf/v1 endpoints require
`edit_posts` to call out that `/maintenance` is an exception and requires the
admin `manage_options` capability.
In `@docs/redis-queue-worker.md`:
- Around line 150-186: Add a clear prerequisite that the worker requires
redis-cli available in PATH on the worker host (because the worker polls the
cdcf:maintenance:until key each POLL_INTERVAL cycle), and update the
install/prerequisites section to instruct operators to install redis-cli (e.g.,
package name for common distros) and verify it for the worker service; also
mention the alternative methods already shown (scripts/cdcf_api.py maintenance
and WP REST API) remain valid but will not work if redis-cli is missing on the
worker host.
In `@wordpress/plugins/cdcf-redis-translations/cdcf-redis-translations.php`:
- Around line 85-93: The current handler always returns HTTP 200 even if Redis
write fails; update the block that calls $redis->setex to check its boolean
return value (the call to $redis->setex) and if it returns false respond with a
5xx WP_REST_Response (e.g., 500) indicating the maintenance flag write failed,
otherwise continue returning the 200 response with 'until' and 'duration';
reference the $redis->setex call and the WP_REST_Response creation when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 537b28ca-ffd9-4cad-a2ee-f5ba40dad8e5
📒 Files selected for processing (6)
.github/workflows/deploy.ymlAGENTS.mddocs/redis-queue-worker.mdscripts/cdcf_api.pyscripts/cdcf_queue_worker.shwordpress/plugins/cdcf-redis-translations/cdcf-redis-translations.php
Codacy flagged three style issues on the new endpoint: - Replace `!class_exists(...)` with `=== false` comparison - Replace `!$redis->connect(...)` with `=== false` comparison - Replace `intval(...)` with `(int)` cast The pre-existing `/process-queue` route still uses the older patterns; those weren't in this PR's diff so Codacy didn't flag them. Cleanup of that route is left for a separate PR to keep this diff minimal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings, fixed: - AGENTS.md: the cdcf/v1 preamble said all endpoints require edit_posts, but /process-queue and /maintenance both require manage_options. The /maintenance row already noted its capability inline, but the preamble was misleading. Update the preamble to flag the exception. - docs/redis-queue-worker.md: the worker now calls redis-cli from in_maintenance() at every poll cycle, so add it to the prerequisites list with a hint about the package name on Debian/Ubuntu and RHEL. - cdcf-redis-translations.php: $redis->setex() can return false on a Redis write failure even when connect() succeeded; without checking it, the deploy would proceed thinking the worker is paused when the flag wasn't actually set. Return 500 with redis_write_failed if the setex returns false; the deploy step's existing 3-attempt retry will surface the failure and abort the deploy. One finding skipped: CodeRabbit claimed the pipe inside backticks in the /maintenance row description splits the cell into four columns. GFM correctly preserves pipes inside inline code spans, so the row renders as a single cell. No change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
scripts/cdcf_queue_worker.sh (1)
136-144:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
redis-cliabsence silently disables maintenance mode — startup guard still missing.If
redis-cliis not inPATH, the2>/dev/nullsuppresses the shell's "command not found" stderr,|| return 1fires, and everyin_maintenance()call returns "false." The deploy workflow sees a200 OKfrom the REST endpoint and believes the worker is paused — but the worker continues processing jobs throughout the deploy, defeating the feature entirely.A one-line startup guard (suggested in the previous review) catches this before the service starts:
🛡️ Proposed fix — add redis-cli check near the existing env-var validation
if [ -z "$WP_REST_URL" ] || [ -z "$WP_APP_USERNAME" ] || [ -z "$WP_APP_PASSWORD" ]; then echo "ERROR: WP_REST_URL, WP_APP_USERNAME, and WP_APP_PASSWORD must be set." exit 1 fi + +if ! command -v redis-cli >/dev/null 2>&1; then + echo "ERROR: redis-cli is required for maintenance-mode checks but was not found in PATH." + exit 1 +fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/cdcf_queue_worker.sh` around lines 136 - 144, The in_maintenance() function can silently return false if redis-cli is missing because the command-not-found stderr is redirected; add a one-line startup guard (near the existing env-var validation) that checks for redis-cli (e.g. use command -v redis-cli >/dev/null or which redis-cli) and, if not found, print a clear error to stderr and exit non-zero so the service fails fast at startup; keep the in_maintenance() logic unchanged but ensure the new check runs before any maintenance or worker loop begins so missing redis-cli cannot silently disable maintenance mode.AGENTS.md (1)
88-88:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUnescaped pipe in table cell still breaks the row into four columns.
The
"begin"|"end"inside the inline code span is treated as a table-column separator. The resulting fourth cell is silently dropped by Markdown renderers, so the description text for the/maintenanceroute is truncated in the rendered docs.🛠️ Proposed fix
-| `POST` | `/maintenance` | Pause or resume the cdcf-queue-worker by setting/clearing a Redis flag (`{action: "begin"|"end", duration_seconds?: int}` — server-clamps duration to 60–600). Requires administrator (`manage_options`) capability. | +| `POST` | `/maintenance` | Pause or resume the cdcf-queue-worker by setting/clearing a Redis flag (`{action: "begin"\|"end", duration_seconds?: int}` — server-clamps duration to 60–600). Requires administrator (`manage_options`) capability. |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` at line 88, The table cell for the `POST /maintenance` route contains a pipe in the inline code span (`{action: "begin"|"end", duration_seconds?: int}`) which is being parsed as a Markdown column separator; update that code span to avoid an unescaped pipe (for example replace the `|` with an HTML entity `|` or escape it) so the cell stays one column and the description isn’t truncated—target the `/maintenance` table row and the inline-code snippet containing `"begin"|"end"` when making the fix.
🧹 Nitpick comments (1)
scripts/cdcf_queue_worker.sh (1)
142-142: ⚡ Quick win
in_maintenance()hardcodes Redis host/port — consider env-var overrides.The PR objectives noted adding
REDIS_HOST/REDIS_PORT/REDIS_PASSWORDto the worker env file; only the host/port are reachable from the shell script and both are currently hardcoded. The PHP plugin uses the same literals today, but making them configurable removes friction if Redis moves or auth is added later.♻️ Proposed refactor
in_maintenance() { local result - result=$(redis-cli -h 127.0.0.1 -p 6379 -n 0 EXISTS cdcf:maintenance:until 2>/dev/null) || return 1 + local _host="${REDIS_HOST:-127.0.0.1}" + local _port="${REDIS_PORT:-6379}" + local _auth_args=() + [ -n "${REDIS_PASSWORD:-}" ] && _auth_args=(-a "$REDIS_PASSWORD" --no-auth-warning) + result=$(redis-cli -h "$_host" -p "$_port" "${_auth_args[@]}" -n 0 EXISTS cdcf:maintenance:until 2>/dev/null) || return 1 [ "$result" = "1" ] }Add the optional vars to the env-file template and docs table as well.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/cdcf_queue_worker.sh` at line 142, The in_maintenance() check hardcodes Redis connection parameters; change it to use environment variables REDIS_HOST (default 127.0.0.1), REDIS_PORT (default 6379) and optional REDIS_PASSWORD when present, and build the redis-cli invocation accordingly instead of the literal "-h 127.0.0.1 -p 6379". Update the line that sets result=$(redis-cli ...) in the in_maintenance function to use these vars (and include "-a $REDIS_PASSWORD" only if REDIS_PASSWORD is non-empty), ensuring you preserve stderr redirection and the "|| return 1" behavior so the function still returns on failure. Also ensure any new env vars are documented/added to the worker env-file template as suggested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@AGENTS.md`:
- Line 88: The table cell for the `POST /maintenance` route contains a pipe in
the inline code span (`{action: "begin"|"end", duration_seconds?: int}`) which
is being parsed as a Markdown column separator; update that code span to avoid
an unescaped pipe (for example replace the `|` with an HTML entity `|` or
escape it) so the cell stays one column and the description isn’t
truncated—target the `/maintenance` table row and the inline-code snippet
containing `"begin"|"end"` when making the fix.
In `@scripts/cdcf_queue_worker.sh`:
- Around line 136-144: The in_maintenance() function can silently return false
if redis-cli is missing because the command-not-found stderr is redirected; add
a one-line startup guard (near the existing env-var validation) that checks for
redis-cli (e.g. use command -v redis-cli >/dev/null or which redis-cli) and, if
not found, print a clear error to stderr and exit non-zero so the service fails
fast at startup; keep the in_maintenance() logic unchanged but ensure the new
check runs before any maintenance or worker loop begins so missing redis-cli
cannot silently disable maintenance mode.
---
Nitpick comments:
In `@scripts/cdcf_queue_worker.sh`:
- Line 142: The in_maintenance() check hardcodes Redis connection parameters;
change it to use environment variables REDIS_HOST (default 127.0.0.1),
REDIS_PORT (default 6379) and optional REDIS_PASSWORD when present, and build
the redis-cli invocation accordingly instead of the literal "-h 127.0.0.1 -p
6379". Update the line that sets result=$(redis-cli ...) in the in_maintenance
function to use these vars (and include "-a $REDIS_PASSWORD" only if
REDIS_PASSWORD is non-empty), ensuring you preserve stderr redirection and the
"|| return 1" behavior so the function still returns on failure. Also ensure any
new env vars are documented/added to the worker env-file template as suggested.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 71ff696e-8a20-463f-838e-43e0c296fea3
📒 Files selected for processing (6)
.github/workflows/deploy.ymlAGENTS.mddocs/redis-queue-worker.mdscripts/cdcf_api.pyscripts/cdcf_queue_worker.shwordpress/plugins/cdcf-redis-translations/cdcf-redis-translations.php
✅ Files skipped from review due to trivial changes (2)
- wordpress/plugins/cdcf-redis-translations/cdcf-redis-translations.php
- scripts/cdcf_api.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/deploy.yml
Two new actionable findings; the other two CodeRabbit re-flagged (redis-cli prereq in worker docs and setex return check in PHP) were already addressed by 4fc9a19 and are stale review state. - worker: fail fast at startup if redis-cli is missing from PATH. Without this guard, in_maintenance()'s 2>/dev/null swallows the shell's "command not found" stderr and || return 1 makes the worker always think it's not in maintenance — so the deploy pause is silently ignored and FPM stress proceeds as before. - AGENTS.md: rewrite the /maintenance row description to avoid the raw pipe in `"begin"|"end"`. GFM does preserve pipes inside backticks, but CodeRabbit's parser disagrees and the prose form ('"begin" or "end"') is unambiguous regardless of which renderer reads it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #62.
Summary
POST /cdcf/v1/maintenancetocdcf-redis-translationsplugin — sets/clearscdcf:maintenance:untilin Redis with TTL clamped to[60, 600].redis-cli EXISTSat the top of each poll cycle; logs one transition line per pause window.beginbefore tarball extraction andendafter plugin activation (withif: always()); production-only.maintenancesubcommand toscripts/cdcf_api.py.docs/redis-queue-worker.mdandCLAUDE.md.Mitigates the FPM saturation that produced the 504 wave during deploys (root cause investigated in #41). Complementary to #61's idempotent activation — neither alone fixes everything; together they remove two of the three concurrent stressors.
Verification done locally
php -lcleanbash -nclean--helpoutput verifiedVerification still needed (post-merge)
scripts/cdcf_queue_worker.shto/usr/local/bin/cdcf_queue_worker.shon the VPS andsystemctl restart cdcf-queue-worker.redis-cli SETEX cdcf:maintenance:until 60 1→ expect one "Entering maintenance mode" log line withinPOLL_INTERVALsredis-cli DEL cdcf:maintenance:until→ expect one "Exiting maintenance mode" log line, then resumed processingworkflow_dispatchdeploy withenvironment: productionand tailjournalctl -u cdcf-queue-worker -f. Expect exactly one "Entering" then one "Exiting" line bracketing the deploy.Test coverage
None — no test runner is configured in this project. A follow-up to add Vitest/PHPUnit/bats coverage is tracked as #63.
Rollback
Revert the PR. The Redis key auto-expires within ≤600s of the last
begin, so a worker on the old version simply resumes immediately. The endpoint going away is harmless once the workflow no longer calls it.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation