Skip to content

fix(plant_all): default cascor conda env to JuniperCascor1 (unblock /v1/health)#237

Merged
pcalnon merged 3 commits intomainfrom
fix/cascor-conda-env-override
May 8, 2026
Merged

fix(plant_all): default cascor conda env to JuniperCascor1 (unblock /v1/health)#237
pcalnon merged 3 commits intomainfrom
fix/cascor-conda-env-override

Conversation

@pcalnon
Copy link
Copy Markdown
Owner

@pcalnon pcalnon commented May 8, 2026

Summary

Fixes the immediate ./util/juniper_plant_all.bash failure where the
juniper-cascor server never returned success on its /v1/health
probe — the cascor process was dying at import time inside the legacy
JuniperCascor conda env (Python 3.14.3 + torch 2.9.1) before it
could bind to port 8201.

Commit What Why
fix(plant_all): default cascor conda env to JuniperCascor1 util/juniper_plant_all.bash The JuniperCascor torch 2.9.1 wheel ships a stub-only torch/_C/ namespace package that shadows _C.cpython-314-x86_64-linux-gnu.so; import torch fails immediately. JuniperCascor1 (Py 3.13 + torch 2.11.0) has the correct layout. Mirrors the existing JUNIPER_CANOPY_CONDA pattern; JUNIPER_WORKER_CONDA made overridable too.
docs(notes): document cascor conda env fix notes/CASCOR_CONDA_ENV_FIX_2026-05-07.md Captures the wheel-layout root cause, the env-local activate/deactivate hooks added to both envs to strip rust_mudgeon LIBTORCH bleed-through, and the open follow-ups (install worker into JuniperCascor1, rebuild torch in JuniperCascor).

Verification

$ source /opt/miniforge3/etc/profile.d/conda.sh
$ conda activate JuniperCascor1
$ cd juniper-cascor/src
$ JUNIPER_CASCOR_HOST=localhost JUNIPER_CASCOR_PORT=8201 python server.py &
$ curl -fsS http://localhost:8201/v1/health
{"status":"ok","version":"0.4.0"}

Server boots cleanly, uvicorn binds to localhost:8201, /v1/health returns 200 immediately.

Out-of-repo changes (documented in the notes file)

  • /opt/miniforge3/envs/JuniperCascor/etc/conda/activate.d/00_isolate_from_tch_rs.sh — new
  • /opt/miniforge3/envs/JuniperCascor/etc/conda/deactivate.d/00_restore_tch_rs.sh — new

These mirror the existing JuniperCanopy1 hooks. They do not fix the wheel-layout bug, but defend against the secondary LIBTORCH bleed-through if torch is ever reinstalled.

Test plan

  • Cascor server boots under JuniperCascor1 and /v1/health returns 200 (verified locally above)
  • CI: pre-commit hooks pass on the script change (shellcheck, yamllint, markdownlint)
  • CI: tests pass (this PR does not touch any importable Python — pure bash + docs)
  • After merge: re-run ./util/juniper_plant_all.bash and observe cascor health probe succeeding

🤖 Generated with Claude Code

pcalnon and others added 2 commits May 7, 2026 22:59
The legacy ``JuniperCascor`` env (Python 3.14.3 + torch 2.9.1) cannot
import torch — the wheel ships a stub-only ``torch/_C/`` namespace
package that shadows ``_C.cpython-314-x86_64-linux-gnu.so``, so
``api/app.py`` dies on its first ``import torch`` and ``server.py``
never binds to port 8201. ``juniper_plant_all.bash`` then loops on
``/v1/health`` until ``HEALTH_CHECK_TIMEOUT`` expires.

Mirror the existing ``JUNIPER_CANOPY_CONDA`` pattern:

- ``JUNIPER_CASCOR_CONDA`` now defaults to ``JuniperCascor1``
  (Python 3.13.13 + torch 2.11.0+cu130, correct wheel layout) and is
  overridable for users with a known-good legacy env.
- ``JUNIPER_WORKER_CONDA`` is also made overridable but stays at
  ``JuniperCascor`` because that is where the
  ``juniper-cascor-worker`` pip wheel currently lives. Once the worker
  is also installed in ``JuniperCascor1`` the worker default can flip
  too and the legacy env can be retired.

Verified: ``server.py`` boots cleanly under ``JuniperCascor1`` and
``curl http://localhost:8201/v1/health`` returns
``{"status":"ok","version":"0.4.0"}``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the diagnosis behind the previous commit: the torch 2.9.1
wheel under Python 3.14 ships a stub-only ``_C/`` directory that
shadows ``_C.cpython-314-...so`` via PEP 420 implicit namespace
packages, which is why ``api/app.py``'s ``import torch`` fails the
moment the cascor server starts.

Also documents the matching activate.d/deactivate.d hooks added to
both ``JuniperCascor`` and ``JuniperCascor1`` to strip the
rust_mudgeon ``LIBTORCH`` / ``LD_LIBRARY_PATH`` bleed-through (mirrors
the ``JuniperCanopy1`` pattern). Those hooks live outside the repo
under ``/opt/miniforge3/envs/<env>/etc/conda/{activate,deactivate}.d/``.

Lists the remaining follow-ups (install worker into JuniperCascor1,
rebuild torch in JuniperCascor, etc.) so the env split does not become
permanent without a decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pcalnon pcalnon self-assigned this May 8, 2026
Signed-off-by: Overtoad <paul.calnon@gmail.com>
Copy link
Copy Markdown
Owner Author

@pcalnon pcalnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved

@pcalnon pcalnon merged commit 58a6755 into main May 8, 2026
30 checks passed
@pcalnon pcalnon deleted the fix/cascor-conda-env-override branch May 8, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant