Skip to content

fix(startup): repair juniper_plant_all.bash for current service contracts (Pass 1)#235

Merged
pcalnon merged 1 commit intomainfrom
fix/startup-scripts-pass1
May 7, 2026
Merged

fix(startup): repair juniper_plant_all.bash for current service contracts (Pass 1)#235
pcalnon merged 1 commit intomainfrom
fix/startup-scripts-pass1

Conversation

@pcalnon
Copy link
Copy Markdown
Owner

@pcalnon pcalnon commented May 7, 2026

Summary

Pass 1 of the 2026-05-07 startup/shutdown scripts audit. Fixes the BROKEN and DEGRADED items in util/juniper_plant_all.bash so that the host-level orchestration script matches the current state of the four target services (juniper-data, juniper-cascor, juniper-canopy, juniper-cascor-worker).

Full audit, including the Pass 2 (NIT) roadmap, in notes/STARTUP_SHUTDOWN_SCRIPTS_AUDIT_2026-05-07.md.

What's fixed

🔴 BROKEN

  • Test: Verify Claude Code Action #1 worker missing required CASCOR_SERVER_URL. Worker exited immediately on launch (config validation error in juniper_cascor_worker/config.py:153–156); the prior 2-second kill -0 check only emitted a WARNING and let plant_all report success. Now derives ws://${JUNIPER_CASCOR_HOST}:${JUNIPER_CASCOR_PORT}/ws/v1/workers (override-friendly) and passes through optional CASCOR_AUTH_TOKEN.
  • ci: Bump actions/download-artifact from 4 to 7 #2 canopy on wrong conda env. Default switched from JuniperCanopy to JuniperCanopy1. Only JuniperCanopy1 has 00_isolate_from_tch_rs.sh in its activate.d/, which prevents the rust_mudgeon LIBTORCH from preempting the env's torch and breaking ~770 canopy tests. JUNIPER_CANOPY_CONDA still respects caller overrides.

🟡 DEGRADED

Out of scope (Pass 2)

NIT items #4, #7#11 (cascor-host export, data uvicorn host honoring, deferred uvicorn pre-flight, pid-file format hardening, chop worker grep tightening) are tracked in the audit document and will land in a follow-up PR. #12 (intentional duplicate echo placeholders) stays as-is per memory feedback_chop_all_echo_debug.

Files changed

  • util/juniper_plant_all.bash — +59 / −28
  • tests/test_juniper_plant_all.py — new file (20 tests)
  • notes/STARTUP_SHUTDOWN_SCRIPTS_AUDIT_2026-05-07.md — new file

Test plan

  • bash -n util/juniper_plant_all.bash passes
  • shellcheck clean on both plant and chop scripts
  • pre-commit run --files <changed> clean (black, isort, flake8, mypy, bandit, shellcheck)
  • python3 -m unittest tests.test_juniper_plant_all — 20/20 tests pass
  • Full repo suite python3 -m unittest discover tests — 112/112 tests pass
  • bash scripts/test_resume_file_safety.bash passes
  • Live-stack smoke test: stage 1 of pre-merge validation will be running util/juniper_plant_all.bash end-to-end on a host with all four envs present and confirming /v1/health returns 200 on all four ports (8100, 8201, 8050, 8210)

🤖 Generated with Claude Code

…acts

Pass 1 of the 2026-05-07 startup/shutdown scripts audit. Addresses two
service-blocking failures and three degraded-config issues uncovered while
validating util/juniper_plant_all.bash against the current state of each
target repo.

BROKEN fixes:
- #1 juniper-cascor-worker now receives the required CASCOR_SERVER_URL,
  derived from JUNIPER_CASCOR_HOST/PORT (override-friendly). Worker exited
  immediately on launch because the script set zero env vars; the prior
  2-second kill -0 check only emitted a WARNING and let plant_all report
  success.
- #2 juniper-canopy default conda env switched from JuniperCanopy to
  JuniperCanopy1, which carries the LIBTORCH-strip activate hook needed
  to prevent the rust_mudgeon LIBTORCH from preempting the env's torch.
  JUNIPER_CANOPY_CONDA still respects caller-provided overrides.

DEGRADED fixes:
- #3 juniper-canopy now receives canonical pydantic-prefixed env vars
  (JUNIPER_CANOPY_CASCOR_SERVICE_URL, JUNIPER_CANOPY_JUNIPER_DATA_URL)
  rather than the deprecated CASCOR_SERVICE_URL alias.
- #5 worker health is now probed via /v1/health/ready against the
  worker's HTTP health listener (default 127.0.0.1:8210) — same shape
  as the other three services. systemd code path also updated.
- #6 pre-flight block validates JuniperCascor conda env, the
  juniper-cascor-worker console-script binary, and the worker's
  health-listener port before any service is launched.

Adds tests/test_juniper_plant_all.py (20 tests, all passing) covering:
- script bash syntax (bash -n)
- canopy conda env default
- worker env-var wiring (CASCOR_SERVER_URL, health port, auth token)
- health URL composition for both nohup and systemd code paths
- canopy canonical env-var rename and legacy-alias removal
- pre-flight worker conda env / binary / port checks
- end-to-end smoke test: missing worker binary aborts pre-flight before
  any service is launched (synthetic JUNIPER_PROJECT_DIR /
  JUNIPER_CONDA_DIR fixture).

Audit document at notes/STARTUP_SHUTDOWN_SCRIPTS_AUDIT_2026-05-07.md
captures all 12 findings, severity grading, source-of-truth references,
and the Pass 2 roadmap (NIT-class items #4, #7-#11).

shellcheck and pre-commit clean. Full test suite (112 tests) passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@pcalnon pcalnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@pcalnon pcalnon merged commit f350133 into main May 7, 2026
30 checks passed
@pcalnon pcalnon deleted the fix/startup-scripts-pass1 branch May 7, 2026 22:29
pcalnon added a commit that referenced this pull request May 7, 2026
fix(startup): Pass 2 nit-class refinements (stacked on PR #235)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant