Skip to content

Promote synvya-staging to synvya: serialize build stages + 60m SSH timeout#47

Merged
alejandro-runner merged 3 commits intosynvyafrom
synvya-staging
Apr 29, 2026
Merged

Promote synvya-staging to synvya: serialize build stages + 60m SSH timeout#47
alejandro-runner merged 3 commits intosynvyafrom
synvya-staging

Conversation

@alejandro-runner
Copy link
Copy Markdown
Member

Summary

Promotes the build serialization fix to production. This is the follow-up to #45 — without these changes, the prod cold build locks the host because BuildKit runs cargo + bun in parallel.

What's in this merge

  • fix(docker): serialize build stages + bump SSH timeout #46 — `fix(docker): serialize build stages + bump SSH timeout`
    • `COPY --from=rust-builder` no-op in `web-builder` forces BuildKit to run rust then bun sequentially (no more concurrent OOM)
    • `appleboy/ssh-action` `command_timeout` raised from 30m to 60m on all four deploy/QA steps
  • `f756f50` — `fix(prep): write cargo config to invoking user's home, not /root`
    • `ec2-prepare-host.sh` now detects `SUDO_USER` and writes `~/.cargo/config.toml` to the right home, then chowns it back

Validated on staging

Staging redeploy completed successfully with the serialize fix.

Production state

The prod EC2 has been manually prepared (swap active, cargo jobs config in place, repo synced) so the next deploy starts from a healthy baseline.

Test plan

  • Merge → confirm prod workflow runs the new serialized build (rust-builder completes before web-builder starts)
  • Confirm cold build fits inside 60m timeout
  • SSH into prod from another terminal during the build to confirm host stays responsive
  • Confirm `docker ps` shows new keycast image healthy after deploy
  • Push a trivial follow-up commit → confirm warm build is fast (cache mounts populated)

Follow-up worth doing

The robust fix for low-mem prod is to stop building on the prod host: build once on GitHub Actions (16 GB RAM), push image to a registry, prod just `docker compose pull && up -d`. Worth scheduling.

alejandro-runner and others added 3 commits April 29, 2026 14:45
The cold build on a t3.medium production host locked the box hard
enough that SSH and SSM both became unresponsive. Root cause:
BuildKit runs independent stages in parallel by default, so cargo
release build (-j 2, ~3 GB RSS) and bun/vite build (NODE_OPTIONS
2 GB) ran simultaneously on a 4 GB instance. Even with 4 GB swap
the system thrashed into a kernel lockup.

Two changes:

1. Add a no-op `COPY --from=rust-builder /artifacts/keycast
   /tmp/.rust-builder-done` as the first instruction of
   web-builder. BuildKit sees the cross-stage dependency and only
   starts web-builder once rust-builder finishes, so cargo and
   bun never run concurrently.

2. Bump appleboy/ssh-action `command_timeout` from 30m to 60m
   across all four deploy/QA steps. A cold cargo + bun build on
   t3.medium with -j 2 takes ~45-55 min; the previous 30m killed
   the SSH session mid-build.

Once cache mounts are populated by a successful cold build,
warm builds return to a few minutes and stay well under the
new timeout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When ec2-prepare-host.sh is invoked under sudo (e.g. while
debugging on the host), $HOME resolves to /root and the cargo
[build] jobs=2 config lands in /root/.cargo/config.toml. The
deploy/QA workflow runs the script over SSH as ec2-user, so it
never reads that config — defeating the limit.

Detect SUDO_USER and write to that user's home instead, then
chown the .cargo/ tree back to them so cargo can read/write it
when running under their UID.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(docker): serialize build stages + bump SSH timeout
@alejandro-runner alejandro-runner merged commit 1de373a into synvya Apr 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant