Skip to content

Conversation

@Carlos-Marques
Copy link

@Carlos-Marques Carlos-Marques commented Jan 20, 2026

Why are the changes needed?

When building Nix-based container images, we need support for multi-architecture builds (amd64 and arm64) to run on different hardware. Currently, Nix builds only support single-platform builds. This PR adds the ability to build for multiple architectures and create a unified multi-arch manifest.

What changes were proposed in this pull request?

Added multi-arch Nix build support to DefaultImageBuilder in flytekit/image_spec/default_builder.py:

  1. Multi-arch detection: When platform contains multiple architectures (e.g., "linux/amd64,linux/arm64"), the build is treated as multi-arch
  2. Parallel architecture-specific builds: For each platform, triggers a separate Depot build with an architecture-specific tag postfix (e.g., myimage:tag-amd64, myimage:tag-arm64). Builds run in parallel using ThreadPoolExecutor
  3. Manifest creation: After all architecture builds complete, creates a multi-arch manifest using docker buildx imagetools create that points to both ECR images
  4. Architecture verification in exist(): Added check_ecr_image_architectures() function and updated ImageSpec.exist() to verify that cached images have ALL required architectures before skipping builds

The implementation follows the pattern from the monorepo's nix-build.yaml workflow.

Updates since last revision

  • Changed from sequential to parallel builds using ThreadPoolExecutor for faster multi-arch builds
  • Fixed missing --push flag in Depot build command - images were being built but not pushed to ECR
  • Added architecture verification in exist() method to detect when a single-arch image is cached instead of a proper multi-arch manifest

Key review points

  • Verify the --push flag is correctly placed in the Depot build command (line ~952)
  • Check architecture parsing in check_ecr_image_architectures() - relies on parsing docker buildx imagetools inspect output
  • Verify the multi-arch detection logic in exist() correctly identifies missing architectures
  • Check if the retry logic (6 attempts, 10s wait) for image verification is appropriate
  • Review thread safety of click.secho calls during parallel builds (output may interleave)
  • Note: Depot project ID bf5bv9t2mj is hardcoded - this is Exa-specific

How was this patch tested?

  • Code passes ruff linting and formatting checks
  • No unit tests were added - the changes involve subprocess calls to depot and docker buildx which are difficult to unit test without mocking
  • Integration testing via monorepo PR #13304 which runs the nix-multiarch-test workflow

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Link to Devin run

https://app.devin.ai/sessions/98b686346e954acf9a44d621ec151cfd

Requested by: @Carlos-Marques

…tion

When building Nix images with multiple platforms (e.g., linux/amd64,linux/arm64),
this change:
1. Triggers separate Depot builds for each architecture with postfixed tags
2. Creates a multi-arch manifest that points to both ECR images

This follows the pattern from the monorepo's nix-build.yaml workflow.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
@devin-ai-integration
Copy link

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Comment on lines 924 to 960
def _build_multi_arch_nix(self, image_spec: ImageSpec, tmp_dir: str, platforms: List[str], push: bool) -> str:
"""Build multi-arch Nix images with architecture-specific tags, then create a manifest.
For each platform (e.g., linux/amd64, linux/arm64), this method:
1. Triggers a Depot build with an architecture-specific tag postfix
2. After all builds complete, creates a multi-arch manifest pointing to both images
"""
base_image_name = image_spec.image_name()
arch_images = []

for platform in platforms:
arch = platform.split("/")[-1] # Extract arch from "linux/amd64" -> "amd64"
arch_tag = f"{base_image_name}-{arch}"
arch_images.append(arch_tag)

click.secho(f"Building Nix image for {platform} with tag {arch_tag}", fg="blue")

command = [
"depot",
"build",
"--tag",
arch_tag,
"--platform",
platform,
"--project",
"bf5bv9t2mj",
tmp_dir,
]

concat_command = " ".join(command)
click.secho(f"Run command: {concat_command}", fg="blue")
run(command, check=True)

if push and image_spec.registry:
self._create_multi_arch_manifest(base_image_name, arch_images)

return base_image_name
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make the builds run in parallel?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Updated to use ThreadPoolExecutor to run all Depot builds in parallel. Each platform build now runs simultaneously, and the manifest is created after all builds complete successfully.

devin-ai-integration bot and others added 14 commits January 20, 2026 02:22
Use ThreadPoolExecutor to run Depot builds for all platforms simultaneously
instead of sequentially. This significantly speeds up multi-arch builds.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ing build

When an ImageSpec has multiple platforms (e.g., linux/amd64,linux/arm64),
the exist() method now verifies that the image is a proper multi-arch
manifest with ALL required architectures. If the image exists but is
missing some architectures (e.g., only has arm64 but needs both amd64
and arm64), it returns False to trigger a rebuild.

This fixes an issue where a single-arch image with the same tag would
cause the multi-arch build to be skipped, resulting in exec format
errors when running on the wrong architecture.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ages

The Dockerfile already handles pushing the nix2container image via
`nix run .#docker.copyTo`. Using --push on the Depot build would push
the Ubuntu-based build image instead of the nix2container image,
overwriting the correct image with the wrong one.

This was causing the 'pyflyte-fast-execute not found' error because
the Ubuntu image doesn't have the Nix store or the flytekit scripts.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…builds

The Dockerfile template now uses ARG variables for IMAGE_NAME and ECR_TOKEN
instead of substituting them at generation time. This allows each
architecture-specific Depot build to pass the correct tag via --build-arg,
ensuring the nix2container image is pushed to the correct ECR tag.

Previously, the copyTo command was pushing to the base tag (without -amd64
suffix) because IMAGE_NAME was substituted at Dockerfile generation time,
before the multi-arch build split into architecture-specific builds.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
The Nix installer with --init none doesn't set up the nixbld group,
which causes 'the group nixbld specified in build-users-group does not exist'
error when running nix commands.

Fix by:
- Setting sandbox = false (sandboxing requires nixbld group)
- Setting build-users-group to empty string (run without daemon/nixbld)

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…dbox=false error

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…thon Template

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…sing

The comment '# Note: $$ is used to escape $ for Python Template substitution'
contained unescaped dollar signs that caused Template to fail with:
ValueError: Invalid placeholder in string: line 14, col 30

Changed to use words instead of symbols in the comment.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
The Depot build cache may contain a pre-existing Nix installation with
different planner settings. This causes the Nix installer to fail with:
'Found existing plan in /nix/receipt.json which used different planner settings'

Fix by checking for and uninstalling any existing Nix installation before
running the installer.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ched builds

The nix-installer uninstall command fails when the cached /nix directory is in
an inconsistent state, leaving it corrupted and causing the subsequent install
to fail with 'trying to unpack outside of destination path' error.

Using rm -rf /nix/* is more reliable for Docker build caches since we want a
fresh install each time anyway.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
… builds

The rm -rf /nix/* command was not fully cleaning up the cached /nix directory,
leaving /nix/temp-install-dir from previous failed builds. The Nix installer
would then fail with 'Directory not empty (os error 39)' when trying to clean
up this directory.

Using 'find /nix -mindepth 1 -delete' is more thorough and ensures all files
including hidden ones are removed before the Nix installer runs.

Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants