Skip to content

docs(cascor-env-fix): libtorch-strip hook pipefail interaction (plant_all silent abort)#239

Merged
pcalnon merged 1 commit intomainfrom
docs/libtorch-strip-hook-pipefail-fix
May 8, 2026
Merged

docs(cascor-env-fix): libtorch-strip hook pipefail interaction (plant_all silent abort)#239
pcalnon merged 1 commit intomainfrom
docs/libtorch-strip-hook-pipefail-fix

Conversation

@pcalnon
Copy link
Copy Markdown
Owner

@pcalnon pcalnon commented May 8, 2026

Summary

Adds §4a to notes/CASCOR_CONDA_ENV_FIX_2026-05-07.md documenting a
secondary failure mode discovered during a fresh
./util/juniper_plant_all.bash run on 2026-05-08:

  • Symptom: script aborted silently on conda activate JuniperCanopy1
    with exit_code=1 and no error output (even the cleanup ERR-trap
    message was swallowed).
  • Root cause: the LIBTORCH-strip activate hooks in JuniperCanopy1
    and JuniperCascor used a grep -v /rust_mudgeon/ pipeline. When
    LD_LIBRARY_PATH is only a single rust_mudgeon segment (the default
    state set by ~/.bashrc), grep exits 1 → pipefail fails the pipeline
    set -e kills the activate hook mid-flight → conda's var-context
    bookkeeping is inconsistent → bash emits pop_var_context: head of shell_variables not a function context cascades.
  • Fix: pure-bash IFS=':' read -ra _segs split + filter loop in
    both hooks. No subprocess that can fail under pipefail. Confirmed all
    four services (data / cascor / canopy / worker) come up healthy after
    the fix.

The hook files live under /opt/miniforge3/envs/<env>/etc/conda/activate.d/
(out-of-repo); this PR captures the diagnosis and canonical replacement
inline so a future env rebuild reinstates the safe pattern.

Test plan

  • Reproduced the silent abort under set -euo pipefail before the fix
  • Confirmed all four services come up healthy after the hook rewrite
  • CI: pre-commit (markdownlint) green
  • No runtime change in this PR — pure docs.

🤖 Generated with Claude Code

Adds §4a to notes/CASCOR_CONDA_ENV_FIX_2026-05-07.md describing the
secondary failure mode hit during the 2026-05-08 plant_all run and the
out-of-repo fix applied to both env-local activate hooks.

Symptom: juniper_plant_all.bash aborted silently on
``conda activate JuniperCanopy1`` with exit code 1, the cleanup ERR
trap output also swallowed.

Root cause: the LIBTORCH-strip activate hooks in JuniperCanopy1 and
JuniperCascor used a ``grep -v /rust_mudgeon/`` pipeline. When
LD_LIBRARY_PATH consisted of *only* a rust_mudgeon segment (which is
the default state set by ~/.bashrc), grep exited 1 → pipefail → set -e
killed the activate hook mid-flight → conda's var-context bookkeeping
was left inconsistent → bash emitted the ``pop_var_context: head of
shell_variables not a function context`` cascade.

Fix: replaced the grep+paste pipeline in both hooks with a pure-bash
``IFS=':' read -ra _segs`` split + filter loop — no subprocess that
can fail under pipefail. Both hooks now activate cleanly under
``set -euo pipefail`` and plant_all runs end-to-end with all four
services healthy.

The hook files themselves live under
/opt/miniforge3/envs/<env>/etc/conda/activate.d/ and are not part of
the repo; documenting the diagnosis and the canonical fix here so a
future env rebuild reinstates the pure-bash split instead of the
fragile grep pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pcalnon pcalnon self-assigned this May 8, 2026
Copy link
Copy Markdown
Owner Author

@pcalnon pcalnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve

@pcalnon pcalnon merged commit 00eeca3 into main May 8, 2026
19 checks passed
@pcalnon pcalnon deleted the docs/libtorch-strip-hook-pipefail-fix branch May 8, 2026 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant