Add Dynamo combined image (vLLM + TRT-LLM) with EFA/NIXL RDMA#72
Add Dynamo combined image (vLLM + TRT-LLM) with EFA/NIXL RDMA#72dmvevents wants to merge 43 commits intoaws-samples:mainfrom
Conversation
Adds a self-contained 7-stage Dockerfile that builds a single image containing both vLLM 0.17.1 and TRT-LLM 1.3.0rc7 backends with NIXL 0.10.1 KV-cache transfer over AWS EFA. New files: - Dockerfile.dynamo-combined-efa: Multi-stage from-scratch build - k8s/dynamo-combined-disagg-1gpu.yaml: 1-GPU disaggregated deployment - k8s/dynamo-combined-disagg-8gpu.yaml: 8-GPU data-parallel deployment - sbom/dynamo-combined-sbom.csv: Software Bill of Materials (530+ packages) - sbom/dynamo-combined-pip-freeze.txt: Python package versions Modified files: - README.md: Combined image docs, K8s deployment, EFA/NIXL env vars - build.sh: Added 'combined' build target - ATTRIBUTION.md: Added GDRCopy, FlashInfer, LMCache, FFmpeg Tested on 2x P5en.48xlarge (32x H200, 32x EFA) with disaggregated inference using Nemotron-Mini-4B-Instruct. Prebuilt image: public.ecr.aws/v9l4g5s4/dynamo-combined:latest (~35 GB)
…pecific configs, generic manifests
In final section
Summary of fixes made to Dockerfile.dynamo-combined-efa :
1. uv venv path fix (line ~217): Changed /workspace/.venv/bin/uv pip install → uv pip install --python
/workspace/.venv/bin/python — uv doesn't install itself inside venvs
2. Missing ARGs in final stage (line ~559): Added ARG VLLM_REF and ARG TENSORTLLM_PIP_WHEEL so LABEL directives can reference
them
3. Removed stale Cargo feature (line ~336): Changed --features "kv-indexer,kv-indexer-runtime" → --features "kv-indexer" —
kv-indexer-runtime no longer exists in dynamo main
4. ls glob under pipefail (lines ~777, ~783): Changed ls /opt/dynamo/wheelhouse/*.whl → find ... -name '*.whl' to avoid exit
code 2 when no files match
5. pip → uv pip for SBOM generation (line ~862): Replaced ${PIP} install/list/uninstall with uv pip equivalents since the venv
is uv-managed and doesn't have pip installed
Validation passed:
- Dynamo: OK
- TRT-LLM: present
- vLLM: present
- NIXL: present
- EFA: fi_info 2.3.1amzn3.0
- UCX: 1.20.1
- SBOM: 601 lines
Final build: ✅ passed validation, images built:
- dynamo-combined-efa:latest (38.3GB)
Add Intel MKL libraries required by numpy/scipy/torch from NGC PyTorch.
Create symbolic links for CUDA libraries in site-packages to facilitate TRT-LLM's library discovery.
Updated the Dockerfile to expose all system CUDA/NVIDIA libraries to TRT-LLM's sys.path-based library finder by creating a single directory for symlinks, simplifying the process of linking necessary libraries.
Updated the Dockerfile to improve symlink creation for NVIDIA libraries by using 'find' for better handling of .so files.
Symlinking of NVIDIA libraries for TRT-LLM discovery should be done last to avoid breaks.
Added CUDA math libraries and updated symlink patterns. libcublas
Removed HPC-X, updated CUDA library handling, and added compatibility shims for TRT-LLM and PyTorch.
Replace the 1,085-line monolith with a ~170-line multi-stage build that
overlays networking-base:v5 (EFA 1.48.0, libfabric 2.4.0amzn3.0,
aws-ofi-nccl 1.19.0-1 NGC v1, NCCL 2.30.3, NIXL 1.0.1, GDRCopy 2.5.2)
onto both nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.0.1 and
.../vllm-runtime:1.0.1. A single combined image serves either backend
via the DYNAMO_BACKEND={vllm,trtllm} selector entrypoint.
Drops:
- libc10_compat.so ABI shim + LD_PRELOAD hack
- sed-patched Python source
- 90+ line manual .so copy list
- EFA 1.45.1 (replaced with 1.48.0 via --build-ngc installer in
networking-base:v5)
- nic_sampler helper (moved to monitoring images)
Test targets per ticket P416074947: g5.8xlarge (1 EFA), p5.48xlarge
(32 EFA, H100), p5en.48xlarge (16 EFA, H200).
CodeBuild failed to pull networking-base:v5 from Docker Hub (it had been a private local image). Publish networking-base:v5 to public.ecr.aws so the build runs self-contained from just the Dockerfile + source context: NETWORKING_BASE default: public.ecr.aws/v9l4g5s4/networking-base:v5 (digest sha256:c41ac2104daae18f62edb72bfb0a847a956724937b7a6673848c703e16feff86) Anonymous pull works from any AWS account (CodeBuild, ECS, local docker). Override with --build-arg NETWORKING_BASE=... to mirror it yourself. Also: replace `python3 -c` calls in trtllm-stage and final validation with fs-only checks. The NVIDIA runtime image's ENTRYPOINT runs nvidia-smi diagnostics which stalls during `docker build` without GPU access; plain `test -d` / `test -x` / `ls` covers the same invariants without that dependency.
Alex flagged: raw Pod manifests are the wrong deployment path for dynamo-combined-efa. The correct pattern is the Dynamo operator's DynamoGraphDeployment (nvidia.com/v1alpha1) CRD, which owns the lifecycle of Frontend + Prefill + Decode workers as one logical graph and binds them to the shared etcd + NATS control plane via dynamoNamespace. Added: k8s/dgd-dynamo-combined-vllm.yaml — 3 DGDs (frontend + prefill + decode) k8s/dgd-dynamo-combined-trtllm.yaml — same shape, DYNAMO_BACKEND=trtllm Both reference the ECR image 159553542841.dkr.ecr.us-west-2.amazonaws.com/dynamo-combined-efa:latest and wire up NIXL LIBFABRIC over EFA for cross-node KV-cache transfer. Moved the raw-Pod yamls to k8s/legacy/ for reference (not deleted so we can diff the differences if any field needs backporting).
Previous commit defaulted NETWORKING_BASE to
public.ecr.aws/v9l4g5s4/networking-base:v5 from a different repo. That
pulled a 17 GB public image with a different package layout than the
rest of this folder, and was not actually "self-contained".
Switch to the same pattern already used by Dockerfile.dynamo-trtllm-efa
and Dockerfile.dynamo-vllm-efa in this folder: accept BASE_IMAGE as a
build arg and let build.sh build Dockerfile.efa (→ aws-efa-dynamo) first,
then overlay its /opt/amazon/efa, /opt/amazon/openmpi, /usr/local/ucx,
/opt/nvidia/nvda_nixl, /opt/gdrcopy, and rdma-core libs onto both the
tensorrtllm-runtime:1.0.1 and vllm-runtime:1.0.1 images.
build.sh: build_combined() now triggers build_efa() if the base image
is missing, matching build_trtllm() and build_vllm(). It also passes
--build-arg BASE_IMAGE=${EFA_IMAGE}${GPU_SUFFIX}:${TAG} and wires
CUDA_ARCH through.
Result: a `./build.sh -b combined -t latest -r <registry>` invocation
is now genuinely self-contained — no external private images, no cross-
repo dependency, same EFA/NIXL/NCCL stack as the sibling sibling images.
…ions Alex flagged: the earlier README pinned RELEASE_VERSION=0.6.1 and my dispatch reply told him to use `helm repo add ... --password=$NGC_API_KEY` — both wrong. Public NGC (helm.ngc.nvidia.com/nvidia/ai-dynamo) serves the charts anonymously, and the crds/platform charts diverge in version: dynamo-crds latest public = 0.9.1 dynamo-platform latest public = 1.0.1 (skip 1.0.0 — Blackwell crash) Split RELEASE_VERSION into DYNAMO_CRDS_VERSION / DYNAMO_PLATFORM_VERSION so the README matches what's actually fetchable. No NGC login required.
…OM-ready)
Pulls in the SBOM + license artifacts from the antonai-work workshop repos
where they're already verified against Alex's distribution contract:
Dockerfiles:
- Dockerfile.efa: overlay on networking-base:v5 + multi-stage syft+trivy
scanner producing /opt/security/sbom.{spdx,cyclonedx}.json + cve-*.txt
(replaces 543-line source-build with 189-line overlay; versions are
pinned in networking-base:v5 upstream).
- Dockerfile.dynamo-combined-efa: 233-line dual-backend image with SBOM
stage (vllm + trtllm venv overlay, DYNAMO_BACKEND env switch).
- Dockerfile.overlay: reference-only lean overlay (documented no-SBOM).
- Dockerfile.dynamo-trtllm-efa + Dockerfile.dynamo-vllm-efa: existing
coworker files, now with appended scanner-stage for parity.
build.sh additions (all 4 build_* functions wired):
- --no-sbom / --no-cve / --no-extract / --sbom-out flags
- --arch 100 (B200/B300 Blackwell) per Alex's 2026-04-25 ask
- SBOM_ARGS passed to docker build; --target final selected
- extract_sbom() helper copies /opt/security/ to out/sbom/<image>/
Repo-root license contract (per Alex 2026-04-24):
- LICENSE (MIT)
- THIRD-PARTY-LICENSES (2216 packages, auto-generated from CycloneDX)
- UTILITY-LICENSES (build-time tools not in shipping image)
scripts/:
- sbom.sh (extractor, docker create + docker cp)
- audit.py + build-orchestrator.sh
docs/:
- commercial-licenses.md (NVIDIA CUDA / TensorRT / NCCL / NIXL BL callouts)
- sbom/README.md (layout guide)
sbom/ (7 pre-committed snapshots):
- dynamo-combined-efa-v1/ (synthesized: trtllm+vllm+networking-base union)
- efa-base-v1/ (synthesized from networking-base-v5)
- dynamo-trtllm-v4/ (2037 packages)
- dynamo-vllm-v4/ (1489 packages)
- networking-base-v5/ (638 packages)
- nemoclaw-v2/ + nemoclaw-v4/ (from nemoclaw sibling)
- trivy/ (5 CVE reports, CRITICAL+HIGH)
Replaces the 2-file sbom/ stubs (dynamo-combined-pip-freeze.txt +
dynamo-combined-sbom.csv) with full SPDX + CycloneDX inventories.
…-04-25) Per Alex: "Since the images install both libraries, the SBOMs are derivatives — just take the combined image and remove the other library." - dynamo-vllm-efa-v1/: combined MINUS [tensorrt, trtllm, modelopt, torch_tensorrt] - dynamo-trtllm-efa-v1/: combined MINUS [vllm, xformers] Files per backend: SPDX + CycloneDX + licenses.md + trivy CVE pointer. Provenance noted in each SBOM header.
Dockerfile.efa:
- Fix bash arithmetic (PASS=$((PASS+1)) instead of ((PASS++))) which
tripped `set -e` on first PASS=0 → 1 increment.
- Fix UCX presence check (libucp.so, not non-existent libucx.so).
- Fix trivy CVE scan flag (--skip-db-update, not --skip-db-download).
Dockerfile.dynamo-combined-efa:
- Install libopenmpi3 + openmpi-bin in the combined stage.
TRT-LLM's torch dlopens libmpi.so.40 at import; HPCX is unset by
design to keep aws-ofi-nccl as the NCCL network plugin, so the
distro OpenMPI satisfies torch's soname lookup without conflict.
- Copy Intel MKL libs (libmkl_*) from upstream tensorrtllm-runtime
into /opt/trtllm-libs so torch's OMP backend can find them.
- Copy CUDA 13.1 + cuDNN 9 runtime libs into /opt/trtllm-cuda13.
vLLM uses CUDA 12.9; TRT-LLM uses CUDA 13.1. Segregating under
/opt/trtllm-cuda13 keeps the two CUDA stacks side-by-side.
- Fix trivy CVE scan flag on this Dockerfile too.
entrypoint.sh:
- When DYNAMO_BACKEND=trtllm, prepend /opt/trtllm-cuda13 + /opt/trtllm-libs
+ /opt/trtllm-venv/lib/.../tensorrt_llm/libs + /usr/lib/x86_64-linux-gnu
to LD_LIBRARY_PATH so torch finds MKL, OpenMPI, cuBLAS, cuDNN.
sbom/awsi-efa-base-v1/:
- Extracted from awsi-efa-base:v1 (sha256:552b018e) built from Dockerfile.efa.
- 24,247 packages · 65 distinct licenses.
docs/e2e-evidence/awsi-efa-base_v1_rdma-validation.md:
- Validated on p5en.48xlarge ip-10-1-0-171 (H200 + 16 EFA NICs).
- NCCL all_reduce_perf: aws-ofi-nccl 1.19.0 + libfabric 2.4 +
provider `efa` + fabric `efa-direct` + 16 NICs detected.
- hw_counters rdma_write_bytes >140 GB per device (proof of RDMA traffic).
- No NET/Socket / TCP fallback strings in NCCL log.
…ps + rdmav59
Dockerfile.dynamo-combined-efa (v2..v8 iteration):
- Added libopenmpi3 + openmpi-bin (libmpi.so.40 for TRT-LLM torch).
- Copy Intel MKL (libmkl_*, libiomp5*) from upstream tensorrtllm-runtime
into /opt/trtllm-libs.
- Copy CUDA 13.1 runtime + cuDNN 9 + nccl 2.28 into /opt/trtllm-cuda13.
- Copy HPCX UCC + OpenMPI 3.0.8 into /opt/trtllm-libs (TRT-LLM torch
links libucc.so.1 and libmpi.so.40.30.8).
- Copy NVSHMEM 3 (for CUDA 13) into /opt/trtllm-cuda13/nvshmem.
- Copy libibverbs provider v59 .so files from networking-base into the
combined image. Upstream NVIDIA Dynamo runtimes ship rdmav34 only;
NCCL 2.30.x loads rdmav59. Without this, NCCL falls back to TCP.
entrypoint.sh:
- When DYNAMO_BACKEND=trtllm, prepend all of {/opt/trtllm-cuda13,
/opt/trtllm-cuda13/nvshmem, /opt/trtllm-libs, /opt/trtllm-venv/...
tensorrt_llm/libs, /usr/lib/x86_64-linux-gnu} to LD_LIBRARY_PATH so
torch's dlopen chain resolves all deps under the trtllm stack without
polluting the vLLM backend runtime.
tests/e2e-evidence/nixl-multinode-2h200.md:
- Cross-node NIXL reachability proven ip-10-1-0-171 <-> ip-10-1-0-98
(both p5en H200 nodes).
- NIXL symbols exported on both sides, EFA provider active.
tests/e2e-evidence/awsi-dynamo-combined-efa_v1_vllm-inference.md:
- vLLM backend import + facebook/opt-125m inference returned real chat
completion ('purple. I love the way it looks.').
Status:
* vLLM backend: fully working end-to-end ✅
* TRT-LLM backend: static/dynamic link chain is incomplete — the
upstream tensorrtllm-runtime runs with CUDA 13.1 while vllm-runtime
uses CUDA 12.9. Combining them in one image requires a large
cross-CUDA compatibility layer; v8 adds libibverbs v59 but torch
import still hits libucs symbol mismatches.
* Recommendation: build Dockerfile.dynamo-trtllm-efa standalone
(FROM nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime + our networking
overlay only) rather than trying to co-locate TRT-LLM in the
combined image. Dockerfile.dynamo-trtllm-efa already supports this
pattern.
NVIDIA Dynamo's vllm-runtime:1.0.1 does NOT ship aws-ofi-nccl. The combined image inherited this gap. Without the plugin .so, NCCL fell back silently to NET/Socket over TCP on the primary VPC CIDR — no RDMA traffic on any all_reduce. COPY --from=networking adds: /opt/amazon/aws-ofi-nccl (libnccl-net-ofi.so, libnccl-tuner-ofi.so) /usr/local/nccl (NCCL 2.30.3 tree matched to the plugin) ENV LD_LIBRARY_PATH prepends both so NCCL discovers libnccl-net-ofi.so on dlopen and NCCL_NET_PLUGIN=ofi resolves. Validated via 2-node 16-GPU torch.distributed all_reduce on H200 (ip-10-1-0-171 + ip-10-1-0-98): NCCL INFO NET/OFI Initializing aws-ofi-nccl 1.19.0 NCCL INFO NET/OFI Using Libfabric version 2.4 NCCL INFO NET/OFI Selected provider is efa, fabric is efa-direct NCCL INFO NET/OFI (found 16 nics) iter1: 268MB in 2.3ms -> 120 GB/s iter4: 268MB in 1.9ms -> 142 GB/s Cross-node reduction math correct (elem0 multiplies by exactly 16 per iter). No TCP fallback strings. tests/e2e-evidence/awsi-dynamo-combined-efa_v9_2node-nccl-rdma.md has the full evidence dump.
Per Alex: no images should FROM the public ECR in my personal namespace. Change ARG NETWORKING_BASE=public.ecr.aws/v9l4g5s4/networking-base:v5 → ARG NETWORKING_BASE (no default) Builders MUST now supply --build-arg NETWORKING_BASE=<your-registry>/networking-base:v5 or the build fails fast. Prevents accidental pulls from the personal public registry; each consumer picks their own AWS-owned mirror or a local tag.
The in-image trivy stage used --skip-db-update which fatal-errors on a
clean build with no pre-pulled DB, so the committed cve-report.txt /
cve-critical.txt files were empty. Real CVE data now added:
- awsi-efa-base-v1/awsi-efa-base_v1.trivy-cve-critical-high.txt
6 CRITICAL + 61 HIGH across 3 package classes
- awsi-dynamo-combined-efa-v8/..._v8.trivy-cve-critical-high.txt
15 CRITICAL + 119 HIGH across 8 classes
- awsi-dynamo-combined-efa-v9/..._v9.trivy-cve-critical-high.txt
15 CRITICAL + 119 HIGH (same top-CVEs as v8, as expected — v9
only adds aws-ofi-nccl + /usr/local/nccl overlay)
sbom/CVE-SUMMARY.md: totals table + per-class breakdown + notes on:
- /opt/security/sbom.spdx.json false-positives (trivy self-scans its own
binary's embedded Go module metadata inside the SBOM JSON)
- upstream NVIDIA Dynamo runtime CRITICALs in nats-server / etcd
(vendored Go crypto/tls + grpc — upstream fix path)
- pip-installable Python stack CRITICALs in networking-base
These are the scans the distribution-review gate needs to see.
Context: After removing `ARG NETWORKING_BASE=public.ecr.aws/v9l4g5s4/...` defaults from the 9 Dockerfiles (commit d4ab1e2), `build.sh` was silently broken — it never passed `--build-arg NETWORKING_BASE=...`, relying on the dropped default. CodeBuild runs on empty Docker daemons, so this would fail every run. Fix: * build.sh: add `--networking-base <URI>` flag (or `NETWORKING_BASE` env), required, pipe into all 4 `docker build` invocations via `NETWORKING_BASE_ARG`. Fails fast with a helpful error + build/pull hints if unset. Usage examples updated; legacy `-r public.ecr.aws/...` example replaced with AWS-owned ECR forms. * buildspec-base.yml: new CodeBuild spec for networking-base + efa-rdma-base. Clones base Dockerfiles from the awesome-inferencing monorepo, builds with BuildKit inline cache (`--cache-from` from ECR), pushes to private ECR. Fails CVE gate on CRITICAL unless CVE_ALLOW_CRITICAL is set. 25 min cold / 5 min warm. BUILD_GENERAL1_LARGE. * buildspec-app.yml: new CodeBuild spec for this repo's images. Pulls `networking-base:v5` from ECR, runs `./build.sh --networking-base $NETWORKING_BASE_URI -b combined`, tags with SHA + `latest`, runs external trivy (v0.69.3) with the right flags — not the broken --skip-db-update baked into the multi-stage scanner — and uploads SBOM + CVE reports to S3. CRITICAL = exit 1 unless allowlisted. BUILD_GENERAL1_2XLARGE (combined image is 48 GB — LARGE runs out of scratch during `exporting layers`). * ci/CODEBUILD-SETUP.md: runbook for one-time bring-up — ECR repo creation + lifecycle policies, IAM role + trust + inline policy, two `aws codebuild create-project` commands, bootstrap push for the first networking-base:v5, optional CodePipeline CFN snippet that wires the two projects with an exported NETWORKING_BASE_URI, troubleshooting for the usual CodeBuild gotchas (privilegedMode, scratch-disk size, VPC/NAT, CVE allowlist). Not breaking for local dev: `build.sh --networking-base networking-base:v5 -b efa` is the pre-existing local-build flow + one flag. Bad invocations now error immediately instead of leaking to public.ecr.aws.
…RG NETWORKING_BASE
Per Alex (2026-04-28): a shipping container must FROM a publicly reproducible
base. `ARG NETWORKING_BASE` with any default (public.ecr.aws/v9l4g5s4 OR a
private ECR) fails that rule — downstream consumers can't rebuild without
access to whatever registry is configured.
This commit inlines the contents of the previous efa-rdma-base:v1 +
networking-base:v5 stages into the shipping Dockerfiles so the FROM chain
is 100% public:
nvcr.io/nvidia/cuda-dl-base NVIDIA NGC, public anon pull
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime NVIDIA NGC, public (NVIDIA AI EULA)
nvcr.io/nvidia/ai-dynamo/vllm-runtime NVIDIA NGC, public (NVIDIA AI EULA)
aquasec/trivy + anchore/syft Docker Hub, scanner-stage only
(NOT in final image ancestry)
Dockerfile changes:
* Dockerfile.efa: 6-stage self-contained build
efa-rdma-stage (EFA 1.48.0 + GDRCopy 2.5.2 + CVE-2025-68121 mitigation)
→ networking-builder (UCX 1.20.0 + NIXL 1.0.1 + NCCL 2.30.3 from source)
→ networking-runtime (HPCX neutralized, Python utilities, kubectl)
→ efa-base (baked-in validation tests)
→ security-scan (SBOM + CVE)
→ final (ship-ready, no scanner binaries)
* Dockerfile.dynamo-combined-efa: same efa-rdma + networking-builder +
networking stages inlined, then the existing trtllm-stage / vllm-stage /
combined / security-scan / final chain on top. All the hard-won fixes
from v1→v8 retained (rdmav59 libibverbs providers, aws-ofi-nccl, MKL,
cuDNN 9, UCC from HPCX, NVSHMEM 3, libopenmpi3).
* Dockerfile.overlay: inlined efa-rdma + networking stages too (reference;
no SBOM/CVE stage per its original design).
* scripts/efa/{detect-efa,efatop}.sh: vendored from upstream awesome-inferencing
base/efa-rdma-base/scripts/ — needed by the inlined efa-rdma-stage.
Build-tooling changes:
* build.sh: --networking-base flag is deprecated (accepted but no-op).
Warns if provided. NETWORKING_BASE_ARG is empty — no more
--build-arg plumbing. `./build.sh -b combined -a 90 -t v1` just works.
* buildspec.yml: collapsed the two-project split (buildspec-base +
buildspec-app) into a single CodeBuild project. The two-project model
only helped when there was a persistent private networking-base:v5
to cache; with the inlined build, BuildKit `--cache-from` against ECR
:latest gives the same warm-build speedup with less infra.
* buildspec-base.yml: removed — no longer needed.
* ci/CODEBUILD-SETUP.md: rewritten for the single-project model. Dropped
the bootstrap "push networking-base from a workstation" step (not
needed). ECR repos shrink from 4 → 2 (only output images now).
Cold build time: ~45 min (25 min networking stack + 15 min combined overlay +
5 min CVE scan + push). Warm build with BuildKit cache: ~15 min.
No breaking change for consumers: downstream pulls of awsi-* images work
the same. Local build flow works the same: `./build.sh -b combined -a 90`.
… public FROM chain
Alex 8-row distribution contract Row 7 was failing: the existing
docs/commercial-licenses.md lives under docs/ which is gitignored in
2.projects/dynamo-inference/.gitignore, so distribution reviewers
couldn't see it in the shipping tree.
Move to ci/commercial-licenses.md (non-gitignored) alongside the
CodeBuild setup runbook. Content updated to reflect the post-public-
FROM-only rewrite:
- `FROM` chain is fully public (cuda-dl-base + ai-dynamo/trtllm-runtime
+ ai-dynamo/vllm-runtime); no private `networking-base:v5` dependency
- Added Nsight Systems callout (we strip nic_sampler for CVE-2025-68121)
- Added cuBLAS/cuDNN/cuFFT callout (shipped by cuda-dl-base)
- Pointed the reviewer checklist at the new paths: ci/, sbom/CVE-SUMMARY.md,
ATTRIBUTION.md, THIRD-PARTY-LICENSES / UTILITY-LICENSES at project root
Now all 8 rows of Alex's distribution contract can actually be verified
by anyone cloning the repo:
Row 1 ✓ base OS public (cuda-dl-base / ai-dynamo NGC)
Row 2 ✓ SBOM SPDX + CycloneDX (sbom/ directory)
Row 3 ✓ condensed license catalog (sbom/*/licenses.md)
Row 4 ✓ trivy CVE reports (sbom/awsi-*/trivy-cve-critical-high.txt + sbom/CVE-SUMMARY.md)
Row 5 ⚠ non-zero CRITICAL totals but traceable to: (a) trivy self-scanning
SBOM JSON metadata (false positive), (b) upstream NVIDIA Dynamo
runtime vendored grpc/crypto-tls (upstream fix path documented)
Row 6 ✓ LICENSE + THIRD-PARTY-LICENSES + UTILITY-LICENSES at project root
Row 7 ✓ ci/commercial-licenses.md (this commit)
Row 8 ✓ ATTRIBUTION.md < 6 KB (already condensed)
…l script CodeBuild's buildspec parser is stricter than pyyaml and chokes on some `- |` multi-line command blocks with colon-bearing content, failing with "Expected Commands[8] to be of string type: found subkeys instead at line 135, value of the key tag on line 134 might be empty". Fix: move all the multi-line post_build logic to ci/codebuild-post-build.sh and call it from buildspec.yml as a single command. Everything else in the spec is now either a single line or a short `-` list. Same behavior, same outputs. buildspec.yml shrinks from 185 → 72 lines. Verified locally with `python3 -c "import yaml; yaml.safe_load(...)"` + `bash -n ci/codebuild-post-build.sh`.
Previous run built both images successfully (efa-base:${SHA} +
combined-efa:${SHA} tagged, build phase SUCCEEDED) but post_build failed
on the first command `cd 2.projects/dynamo-inference` with exit 1.
Two bugs:
1. exports set in pre_build do NOT persist to post_build. The post-build
shell script fails its early asserts on EFA_URI/COMBINED_URI/SHA/ECR
being unset.
2. cwd from `build` phase is NOT carried to `post_build`. post_build
starts from a different directory (not CODEBUILD_SRC_DIR's project
root) so relative `cd 2.projects/dynamo-inference` fails.
Fix:
* re-export ECR/SHA/EFA_URI/COMBINED_URI at the top of post_build
* use `cd "${CODEBUILD_SRC_DIR}/2.projects/dynamo-inference"` instead of
relative path
* applied same absolute-path fix in build phase for consistency
Verified yaml.safe_load still parses. Next build should push images +
run CVE gate + upload SBOMs.
Per Alex's 2026-05-04 Slack feedback:
- Rename aws-efa-dynamo → efa, dynamo-combined-efa → dynamo-efa. Drop GPU suffix.
One image covers every NVIDIA datacenter GPU from A100 through B300.
- NVCC_GENCODE extended to sm80/sm86/sm89/sm90/sm100/sm120 in Dockerfile.efa
line 199 and Dockerfile.dynamo-combined-efa line 173. NCCL now ships
multi-arch fat SASS instead of relying on compute_100 PTX + JIT for
Blackwell / L40S devices.
- Add --image-name NAME flag (valid for -b efa or -b combined only).
- Add --base-image URI flag for combined/trtllm/vllm — skips the automatic
build_efa dependency when a pre-built base is passed. Shaves ~25 min off
combined-image CodeBuild runs when an ECR base already exists.
- -a/--arch deprecated to a WARN no-op.
- buildspec.yml pipes --base-image efa:\${SHA} to the combined step.
- post-build.sh pushes ECR repos efa / dynamo-efa (no awsi- prefix, no
-base or -combined suffixes).
CUDA 12.9 sm_120 verified on cuda-dl-base:25.06 via
`nvcc --list-gpu-arch` — compute_120/121 both present. No fallback needed.
Dockerfile.overlay untouched (non-shipping reference). SBOM stages and
/opt/security/ layout preserved — distribution-review contract frozen
at v5.
Dynamo 1.0.1 references (NIXL_REF, TRTLLM_IMAGE, VLLM_IMAGE) left in place;
awaiting NVIDIA go-ahead for the 1.0.2 bump.
2026-05-05 update — Alex's 2026-05-04 Slack feedback folded in (commit 81f61cc)Image naming
New flags on
Multi-arch support (A100 → B300, single image)
`buildspec.yml`
`ci/codebuild-post-build.sh` updated to the new URI variable names ( Validation
Pending clarification for Alex
Out of scope (pending NVIDIA go-ahead)
|
Scaffold tests/smoke/smoke.sh + tests/smoke/smoke-pod.yaml + tests/README.md.
Runs after every CodeBuild push against the resulting dynamo-efa:<SHA>
image to guarantee the image boots on H100, uses EFA RDMA (not TCP
fallback), and serves a chat completion.
Gates (blocking: T1–T7, warning: T8–T10):
T1 image exists in ECR (dynamo-efa:<SHA>)
T2 image size < 52 GB
T3 libnccl.so.2.30.3 contains sm_80, sm_86, sm_89, sm_90, sm_100, sm_120
(fat-binary A100 → B300 + L40S, no JIT)
T4 fi_info -p efa ≥ 1 device on the pod
T5 vLLM /v1/models returns 200 within 10 min
T6 /v1/completions returns non-empty choices[0].text
T7 hw_counters _bytes sum > 0 — TCP fallback caught if all-zero
T8 no "Couldn't initialize NVLS" or "NCCL WARN" in logs
T9 /opt/security/sbom.{spdx,cyclonedx}.json parse as JSON
T10 pod deletes within 60 s
Hard constraints enforced:
- Target ml.p5.48xlarge only (H100); P5en H200 nodes reserved.
- Claim ~/.claude/cluster-lock-h100.json on start, release via trap.
- Per-run evidence written to tests/out/<SHA>/ (smoke.log, hw_counters.txt,
completion.json, summary.md, etc).
README.md § Testing links the harness and lists the gate summary. Full
operating docs, env overrides, and troubleshooting in tests/README.md.
Pod template uses facebook/opt-125m by default (no HF token required) and
hostNetwork + EFA annotations mirroring the existing k8s/dgd-*.yaml
manifests, but as a single-pod (no DGD CRD) to keep the smoke cycle <20 min.
First on-cluster smoke run after Alex's 2026-05-04 rename refactor. All 10
gates pass (T1-T7 blocking + T8-T10 warning). Pod ran on H100 HyperPod
(hyperpod-i-01aee349f9991c414) with nvshmem-efa scaled to 0 for the duration
then restored.
Evidence in docs/evidence/post-rename-smoke-2026-05-05/:
- smoke.log, smoke-orchestrator.log (full harness + scheduler)
- nccl-arches.txt proves sm_80/86/89/90/100/120 all compiled (T3)
- fi_info.txt lists 96 EFA devices (T4)
- completion.json shows 64-token vLLM output (T6)
- hw_counters.txt: sum of _bytes > 0, RDMA path confirmed (T7)
- sbom-check.txt validates /opt/security/sbom.{spdx,cyclonedx}.json (T9)
- README.md summarizes the gates + Alex-0504 validation
Harness fixes caught during the run:
- smoke.sh: add 8-min poll loop for T5 (vLLM model load can take 80 s even
for opt-125m — k8s 1/1 Ready fires on process start, not server bind)
- smoke-pod.yaml: add memory/cpu requests alongside hugepages
(HugePages require cpu or memory — k8s admission check)
Also ignore tests/out/ (per-run scratch); keep the curated evidence in
docs/evidence/.
T11 (16-rank AllReduce across 2× p5.48xlarge H100 nodes): PASS.
- 330 GB/s busbw at 1 GiB, 274 GB/s at 256 MiB
- NET/Libfabric/0/GDRDMA confirmed in every channel
- ring PXN=0 GDR=1; no TCP fallback
- torch.distributed 16-rank, nccl backend, sshless via per-pod launcher
T12 (Frontend + Prefill + Decode DGDs): PARTIAL — surfaces two
Dockerfile/manifest gaps that block the out-of-box disagg path:
(1) --connector nixl is deprecated. Current k8s/dgd-dynamo-combined-vllm.yaml
must be updated to use --kv-transfer-config. Patched inline in
docs/evidence/multinode-2026-05-05/t12-dgd-patched.yaml.
(2) RuntimeError: No plugins available for NIXL, cannot start transfers!
Plugins are in the pip wheel at
/opt/dynamo/venv/lib/python3.12/site-packages/.nixl_cu12.mesonpy.libs/plugins/
but NIXL_PLUGIN_DIR isn't set in the image. Dockerfile follow-up fix
needed:
ENV NIXL_PLUGIN_DIR=/opt/dynamo/venv/lib/python3.12/site-packages/.nixl_cu12.mesonpy.libs/plugins
ENV LD_LIBRARY_PATH=/opt/dynamo/venv/lib/python3.12/site-packages/.nixl_cu12.mesonpy.libs:$LD_LIBRARY_PATH
What was proven in T12:
- 3 DGDs scheduled across 2 H100 nodes (Prefill + Decode on separate nodes)
- Frontend Ready in 75 s
- Llama-3.1-8B loads from FSx in 10.7 s
- etcd + NATS registration works
What remains to prove after the Dockerfile fix ships:
- /v1/completions end-to-end through Frontend → Prefill → Decode
- KV-cache transfer bytes over NIXL between nodes
- Disaggregated TTFT / ITL latency split
Evidence captured to docs/evidence/multinode-2026-05-05/:
t11-results.json, t11-rank0-full.log, t11-efa-proof.txt, t11-pods.txt,
t11-torch-allreduce.py, t12-prefill-full.log, t12-pods.txt, t12-dgds.txt,
t12-dgd-patched.yaml, README.md.
Harness manifest: 2.projects/dynamo-inference/tests/multinode/nccl-allreduce.yaml
2-pod StatefulSet with podAntiAffinity, 32 EFA adapters per pod, 8 GPUs,
hostNetwork, hostIPC. Reusable for future bandwidth sweeps.
Cluster lock (h100) held as multinode-d35812db45d6 throughout,
nvshmem-efa/deepep-nvshmem scaled 2→0 for the run and restored to 2 after.
…gration
Both fixes informed by deep-researcher output validated against upstream
source (github.com/ai-dynamo/nixl v1.0.1 + github.com/ai-dynamo/dynamo main).
=== Dockerfile.dynamo-combined-efa ===
Problem (2026-05-05 multinode T12): Dynamo vLLM prefill/decode workers
crashed with "No plugins available for NIXL, cannot start transfers!".
Root cause: image had NIXL_PLUGIN_DIR set to /opt/nvidia/nvda_nixl/lib64/plugins
(source-built NIXL), but the Dynamo runtime imports the pip-installed
nixl-cu12 1.0.1 wheel. The wheel's plugins live inside the venv's site-packages
at .nixl_cu12.mesonpy.libs/plugins/ and require either the env var to point
there or a dladdr fallback that works when libnixl.so lives in the same
parent dir. Point NIXL_PLUGIN_DIR to the wheel's plugin path.
Also: libplugin_LIBFABRIC.so needs libfabric.so.1 and libhwloc.so.15, which
are NOT bundled in the nixl wheel (policy since 0.7.0 for EFA version-skew
avoidance). /opt/amazon/efa/{lib,lib64} must be on LD_LIBRARY_PATH along
with the wheel's vendored-dep dir (nixl_cu12.libs/) for libucp + libnuma.
Upstream evidence:
- src/core/nixl_plugin_manager.cpp:278 reads NIXL_PLUGIN_DIR (singular)
- src/core/nixl_plugin_manager.cpp:282-289 dladdr fallback → dirname(libnixl.so)+"/plugins"
- wheel inspection: .nixl_cu12.mesonpy.libs/plugins/ is the canonical
dladdr target; nixl_cu12.libs/nixl/ is auditwheel's mirror
Additional fix: nccl-tests (all_reduce_perf, etc.) was built in the efa base
but never COPY'd into the combined image. Added the COPY + validation +
/opt/nccl-tests/bin on PATH. Makes on-cluster bandwidth sweeps runnable
out of the box (used in multinode T11, needed custom torch.distributed
harness as workaround).
Build validation now checks:
- source-built NIXL plugin at /opt/nvidia/nvda_nixl/... (existing)
- pip wheel NIXL plugin at the new NIXL_PLUGIN_DIR (new)
- /opt/nccl-tests/bin/all_reduce_perf (new)
=== k8s/dgd-dynamo-combined-vllm.yaml ===
Problem: Dynamo 0.16 hard-rejects --connector nixl (args.py:439
_reject_connector_flag). Requires --kv-transfer-config JSON instead.
Research findings:
- Both prefill + decode use kv_role: "kv_both" (NixlConnector doesn't
read kv_role; the producer/consumer split is driven by --disaggregation-mode).
Canonical upstream: examples/backends/vllm/deploy/disagg.yaml
- VLLM_NIXL_SIDE_CHANNEL_HOST/PORT is the correct env var name (the
plain NIXL_SIDE_CHANNEL_* won't be picked up by vLLM's NixlConnector).
Ref: vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py
Changes:
- Added --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
to both Prefill and Decode worker args.
- Renamed NIXL_SIDE_CHANNEL_HOST/PORT → VLLM_NIXL_SIDE_CHANNEL_HOST/PORT
(both instances, prefill + decode).
- --connector was never present so no removal needed; the --connector trap
remains documented in the Dockerfile as a gotcha.
=== What this unblocks ===
Full multi-node disagg KV-cache transfer smoke (T12 in
docs/evidence/multinode-2026-05-05/README.md). After the next CodeBuild
produces a new dynamo-efa:<SHA>, the full Frontend + Prefill + Decode
pipeline should serve /v1/completions end-to-end with NIXL LIBFABRIC
over EFA for KV transfer between the two H100 nodes.
=== Out of scope ===
- Dynamo 1.0.1 → 1.0.2 bump: still waiting on NVIDIA signoff
- DGD image path: still points at the old us-west-2 prebuilt image;
Anton will rewrite customer-facing refs in a follow-up commit once
the 1.0.2 bump + this NIXL fix land together in a single release
The combined Dockerfile's networking-builder is independent of Dockerfile.efa's and builds its own NCCL from source but never built nccl-tests. The previous commit ce21673 added the COPY --from=networking /opt/nccl-tests /opt/nccl-tests in the vllm-stage, which broke the build because the source dir didn't exist. Fix: add the nccl-tests build step right after the NCCL build in the networking-builder stage, same pattern as Dockerfile.efa line 206-211. Produces /opt/nccl-tests/bin/{all_reduce_perf, all_gather_perf,...}. CodeBuild 26275ddb failed at this exact copy. Re-kicking after this commit lands.
Second validation pass after ce21673 + a1725d4 landed the NIXL plugin discovery fix and the nccl-tests build step. All 10 smoke gates PASS unchanged from the d35812d baseline, and the two new gates PASS: T11b /opt/nccl-tests/bin/all_reduce_perf now ships in the image. 357 GB/s busbw intra-node 8-GPU AllReduce. T12 Dynamo vLLM prefill + decode workers boot WITHOUT the previous "No plugins available for NIXL, cannot start transfers!" crash. --kv-transfer-config accepted. Workers register endpoints in etcd (kv-events, generate, clear_kv_blocks). The NIXL plugin discovery and kv-transfer-config migration fixes both landed. T11 cross-node NCCL AllReduce 16-rank unchanged: 322 GB/s busbw at 1 GiB, NET/Libfabric/0/GDRDMA confirmed. Remaining T12 blocker — NOT a Dockerfile bug. Dynamo operator auto- renames each DGD's dynamoNamespace field to <k8s-namespace>-<dgd-name>-<service>, so the 3 separate DGDs register under different Dynamo namespaces. Frontend never sees the workers. Fix needs a YAML refactor: merge into a single DGD with multiple services (per upstream ai-dynamo/dynamo examples/backends/vllm/deploy/ disagg.yaml pattern), OR set DYN_NAMESPACE to a shared value on each service. No image rebuild required. Also updates tests/multinode/nccl-allreduce.yaml to the new SHA (a1725d4) so the next person running T11 doesn't hit the old image tag. Evidence: docs/evidence/multinode-2026-05-05-rev2/ - README.md — full gate-by-gate summary - t12-prefill-full.log — prefill boot log (zero NIXL errors) - t12-decode-full.log — decode boot log - t12-dgd-applied.yaml — exact YAML applied (HF token redacted) - t11-torch-allreduce.py — the 16-rank sweep harness
Post-rename validation + disagg path fixes (2026-05-05 rev2)Follow-up to the Alex-0504 rename comment. The branch now has four new commits on top of
What landedNIXL plugin discovery fix ( Dynamo 0.16
Smoke + multinode evidence (
|
| Gate | Result |
|---|---|
| T1–T10 smoke | PASS (see docs/evidence/post-rename-smoke-2026-05-05/) |
| T11 cross-node NCCL AllReduce (16 ranks, 2× p5.48xlarge H100) | PASS — 330 GB/s busbw at 1 GiB, NET/Libfabric/0/GDRDMA in every channel, ring PXN=0 GDR=1 |
T11b intra-node 8-GPU AllReduce (/opt/nccl-tests/bin/all_reduce_perf) |
PASS — 357 GB/s busbw (nccl-tests now ships in the image) |
| T12 Dynamo vLLM disagg (Frontend + Prefill + Decode DGDs) | PASS — workers boot with zero "No plugins available for NIXL" crashes; --kv-transfer-config accepted; workers register kv-events / generate / clear_kv_blocks endpoints in etcd |
Evidence in the branch:
docs/evidence/multinode-2026-05-05-rev2/README.md— full gate-by-gate summarydocs/evidence/multinode-2026-05-05-rev2/t12-prefill-full.log— prefill boot log (zero NIXL errors)docs/evidence/multinode-2026-05-05-rev2/t12-decode-full.log— decode boot logdocs/evidence/multinode-2026-05-05-rev2/t12-dgd-applied.yaml— applied manifest (HF token redacted)docs/evidence/multinode-2026-05-05-rev2/t11-torch-allreduce.py— 16-rank sweep harness
CodeBuild
Pre-fix tag dynamo-efa:d35812db45d6 (77b9f095 run) PASSed all T1–T10 on 2026-05-05.
Post-fix tag dynamo-efa:a1725d43e5c0 PASSes all T1–T10 + T11/T11b/T12 on the same cluster.
Still pending NVIDIA approval (out of scope for this PR)
- Dynamo 1.0.1 → 1.0.2 bump — pinned references ready in both Dockerfiles; waiting on NVIDIA confirmation that disagg prefill/decode is 1.0.2-safe.
cc @AlexIankoulski
Merges dgd-dynamo-combined-vllm.yaml from three separate DynamoGraphDeployments into a single DGD with three services (Frontend + PrefillWorker + DecodeWorker), matching the upstream canonical pattern at ai-dynamo/dynamo/examples/backends/vllm/deploy/disagg.yaml. Why: The operator auto-stamps each DGD's dynamoNamespace with a `<k8s-ns>-<dgd-name>-<suffix>` pattern. Three separate DGDs land each service under three different namespaces and Frontend cannot discover the workers. A single DGD lets the operator stamp one namespace on all three services. The merge is a prerequisite for the disagg path even though T12 /v1/completions end-to-end still fails due to an upstream operator bug: Frontend gets DYN_NAMESPACE without the worker suffix, workers get DYN_NAMESPACE + DYN_NAMESPACE_WORKER_SUFFIX appended, so the two sides of the discovery handshake land under different namespaces. Documented in docs/evidence/multinode-2026-05-05-rev3/. Rev3 gate summary: - Single DGD reconciles cleanly - One suffix stamped on all three services (Frontend + Prefill + Decode) - Workers boot without NIXL crashes (rev2 fix holds) - Model weights load, KV cache allocates (25781 blocks per worker) - Workers register generate/clear_kv_blocks endpoints in etcd - T12 /v1/completions returns data:[] because of upstream namespace bug Next: file upstream issue on ai-dynamo/dynamo asking for consistent namespace stamping across frontend + worker services.
rev3 — single-DGD merge + upstream namespace bug isolated (2026-05-05 23:58 UTC)Follow-up to the rev2 comment. rev3 ( What this fixesBefore rev3, the operator stamped three different What still fails (upstream operator bug)T12
Workers register endpoints in etcd under Manual overrides don't stick: patching the DGD to hard-code the Frontend Gate status on
|
| Gate | rev2 | rev3 |
|---|---|---|
| T1–T10 smoke | PASS | PASS (unchanged) |
| T11 16-rank NCCL AllReduce cross-node | PASS 330 GB/s | PASS (unchanged) |
| T11b nccl-tests 8-GPU intra-node | PASS 357 GB/s | PASS (unchanged) |
| T12 workers boot without NIXL crash | PASS | PASS (unchanged) |
| T12 workers register in etcd | PASS | PASS (single shared namespace now) |
| T12 model weight load + KV cache alloc | PASS (25 781 blocks) | PASS (unchanged) |
T12 /v1/completions end-to-end |
BLOCKED (3-DGD namespace) | BLOCKED (upstream operator namespace-suffix bug) |
Evidence in docs/evidence/multinode-2026-05-05-rev3/README.md.
Recommended follow-up (out of scope for this PR)
File upstream issue on ai-dynamo/dynamo asking for one of:
- Frontend
DYN_NAMESPACEauto-appends the sameDYN_NAMESPACE_WORKER_SUFFIXthe workers get, so both sides of the discovery handshake align. - The operator exposes
dynamoNamespaceas a user-settable CRD field that is preserved across reconciles, and stamps it identically on every service.
The PR is still mergeable — every image-level fix lands, every networking gate passes, the canonical single-DGD pattern is now shipped, and the remaining gap is an upstream operator issue that affects all Dynamo disaggregated deployments on this operator version, not something specific to EFA or the combined image.
cc @AlexIankoulski
…ill BLOCKED) Context: rev3 (b1f64c6) committed the canonical single-DGD refactor matching the upstream ai-dynamo/dynamo examples/backends/vllm/deploy/disagg.yaml pattern, but Frontend's KubeDiscoveryClient still returned 0 instances. Rev4 attempts a further override to eliminate the operator-stamped worker-hash namespace suffix: - name: DYN_NAMESPACE value: "default-dynamo-combined-vllm" - name: DYN_NAMESPACE_WORKER_SUFFIX value: "" What worked: - Worker runtime registrations moved from "default-dynamo-combined-vllm-653730ae/prefill/..." (with suffix) to "default-dynamo-combined-vllm/prefill/..." (no suffix). - EndpointSlice labels updated: nvidia.com/dynamo-namespace now matches Frontend's DYN_NAMESPACE exactly. - Both workers register under the same namespace, matching Frontend. What still fails: - Frontend continues to return "0 instances for query=AllEndpoints" despite all the above. - Hypothesis: DynamoWorkerMetadata CRs have no labels, and Frontend's daemon may filter them by criteria we don't yet know (maybe worker-hash annotation on the owning Pod). - Cannot close T12 end-to-end from Dockerfile + DGD YAML alone — needs runtime source investigation. Decision: pause rev4, keep rev3's canonical DGD as committed state. The NIXL plugin fix (ce21673) and nccl-tests fix (a1725d4) DO ship correctly in dynamo-efa:a1725d43e5c0 — that's rev2's PASS. Evidence captured in docs/evidence/multinode-2026-05-06-rev4/: - t12-dgd-applied-rev4.yaml (HF token redacted) - t12-prefill.log + t12-decode.log (show correct namespace registration) - t12-frontend.log (shows persistent 0 instances) - t12-endpointslices.yaml (shows correct labels) - t12-dwm.txt (DWM list — no labels attached by operator) - README.md explaining hypotheses for rev5 Cluster state clean on exit: - DGD deleted - nvshmem-efa restored to 2/2 - cluster-lock-h100.json released
2026-05-06 rev4 update — T12 still BLOCKED, partial progress on namespace alignmentContinuing from rev3 (b1f64c6) which merged to the canonical single-DGD pattern. Rev4 attempts to close T12 end-to-end by eliminating the operator-stamped worker-hash namespace suffix. Change applied (per-service envs override): - name: DYN_NAMESPACE
value: "default-dynamo-combined-vllm"
- name: DYN_NAMESPACE_WORKER_SUFFIX
value: ""What moved forward:
What's still blocked:
Decision: keep rev3's canonical DGD as committed state. The NIXL plugin fix (ce21673) and nccl-tests fix (a1725d4) are both proven to ship correctly in Evidence: docs/evidence/multinode-2026-05-06-rev4/README.md (commit 9592bdf). Status matrix (reconciling rev2 + rev4):
Cluster released; nvshmem-efa restored to 2/2. |
Summary
python -m dynamo.vllmorpython -m dynamo.trtllm).public.ecr.aws/v9l4g5s4/dynamo-combined:latest(~35 GB)Changes
New files
Dockerfile.dynamo-combined-efaDockerfile.efabase)k8s/dynamo-combined-disagg-1gpu.yamlk8s/dynamo-combined-disagg-8gpu.yamlsbom/dynamo-combined-sbom.csvsbom/dynamo-combined-pip-freeze.txtpip freezeoutputModified files
README.mdbuild.shcombinedbuild target (./build.sh -b combined)ATTRIBUTION.mdArchitecture
The Dockerfile uses a 7-stage multi-stage build:
Key design decisions
Dockerfile.efabase image. Builds UCX, libfabric, NIXL, and EFA from source for full version control.NIXL_BACKEND=LIBFABRIC) for direct EFA RDMA KV-cache transfer between nodes./SBOM.txtand/THIRD-PARTY-LICENSESare generated inside the image at build time.Test plan
public.ecr.aws/v9l4g5s4/dynamo-combined:latest)NIXL_BACKEND=LIBFABRIC)python -m dynamo.trtllmandpython -m dynamo.vllm