-
Notifications
You must be signed in to change notification settings - Fork 1
[flytekit]: Add multi-arch Nix build support with manifest creation #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…tion When building Nix images with multiple platforms (e.g., linux/amd64,linux/arm64), this change: 1. Triggers separate Depot builds for each architecture with postfixed tags 2. Creates a multi-arch manifest that points to both ECR images This follows the pattern from the monorepo's nix-build.yaml workflow. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
| def _build_multi_arch_nix(self, image_spec: ImageSpec, tmp_dir: str, platforms: List[str], push: bool) -> str: | ||
| """Build multi-arch Nix images with architecture-specific tags, then create a manifest. | ||
| For each platform (e.g., linux/amd64, linux/arm64), this method: | ||
| 1. Triggers a Depot build with an architecture-specific tag postfix | ||
| 2. After all builds complete, creates a multi-arch manifest pointing to both images | ||
| """ | ||
| base_image_name = image_spec.image_name() | ||
| arch_images = [] | ||
|
|
||
| for platform in platforms: | ||
| arch = platform.split("/")[-1] # Extract arch from "linux/amd64" -> "amd64" | ||
| arch_tag = f"{base_image_name}-{arch}" | ||
| arch_images.append(arch_tag) | ||
|
|
||
| click.secho(f"Building Nix image for {platform} with tag {arch_tag}", fg="blue") | ||
|
|
||
| command = [ | ||
| "depot", | ||
| "build", | ||
| "--tag", | ||
| arch_tag, | ||
| "--platform", | ||
| platform, | ||
| "--project", | ||
| "bf5bv9t2mj", | ||
| tmp_dir, | ||
| ] | ||
|
|
||
| concat_command = " ".join(command) | ||
| click.secho(f"Run command: {concat_command}", fg="blue") | ||
| run(command, check=True) | ||
|
|
||
| if push and image_spec.registry: | ||
| self._create_multi_arch_manifest(base_image_name, arch_images) | ||
|
|
||
| return base_image_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make the builds run in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Updated to use ThreadPoolExecutor to run all Depot builds in parallel. Each platform build now runs simultaneously, and the manifest is created after all builds complete successfully.
Use ThreadPoolExecutor to run Depot builds for all platforms simultaneously instead of sequentially. This significantly speeds up multi-arch builds. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ing build When an ImageSpec has multiple platforms (e.g., linux/amd64,linux/arm64), the exist() method now verifies that the image is a proper multi-arch manifest with ALL required architectures. If the image exists but is missing some architectures (e.g., only has arm64 but needs both amd64 and arm64), it returns False to trigger a rebuild. This fixes an issue where a single-arch image with the same tag would cause the multi-arch build to be skipped, resulting in exec format errors when running on the wrong architecture. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ages The Dockerfile already handles pushing the nix2container image via `nix run .#docker.copyTo`. Using --push on the Depot build would push the Ubuntu-based build image instead of the nix2container image, overwriting the correct image with the wrong one. This was causing the 'pyflyte-fast-execute not found' error because the Ubuntu image doesn't have the Nix store or the flytekit scripts. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…builds The Dockerfile template now uses ARG variables for IMAGE_NAME and ECR_TOKEN instead of substituting them at generation time. This allows each architecture-specific Depot build to pass the correct tag via --build-arg, ensuring the nix2container image is pushed to the correct ECR tag. Previously, the copyTo command was pushing to the base tag (without -amd64 suffix) because IMAGE_NAME was substituted at Dockerfile generation time, before the multi-arch build split into architecture-specific builds. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
The Nix installer with --init none doesn't set up the nixbld group, which causes 'the group nixbld specified in build-users-group does not exist' error when running nix commands. Fix by: - Setting sandbox = false (sandboxing requires nixbld group) - Setting build-users-group to empty string (run without daemon/nixbld) Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…dbox=false error Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…thon Template Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…sing The comment '# Note: $$ is used to escape $ for Python Template substitution' contained unescaped dollar signs that caused Template to fail with: ValueError: Invalid placeholder in string: line 14, col 30 Changed to use words instead of symbols in the comment. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
The Depot build cache may contain a pre-existing Nix installation with different planner settings. This causes the Nix installer to fail with: 'Found existing plan in /nix/receipt.json which used different planner settings' Fix by checking for and uninstalling any existing Nix installation before running the installer. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
…ched builds The nix-installer uninstall command fails when the cached /nix directory is in an inconsistent state, leaving it corrupted and causing the subsequent install to fail with 'trying to unpack outside of destination path' error. Using rm -rf /nix/* is more reliable for Docker build caches since we want a fresh install each time anyway. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
… builds The rm -rf /nix/* command was not fully cleaning up the cached /nix directory, leaving /nix/temp-install-dir from previous failed builds. The Nix installer would then fail with 'Directory not empty (os error 39)' when trying to clean up this directory. Using 'find /nix -mindepth 1 -delete' is more thorough and ensures all files including hidden ones are removed before the Nix installer runs. Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Co-Authored-By: carlos@exa.ai <carlosmarques.personal@gmail.com>
Why are the changes needed?
When building Nix-based container images, we need support for multi-architecture builds (amd64 and arm64) to run on different hardware. Currently, Nix builds only support single-platform builds. This PR adds the ability to build for multiple architectures and create a unified multi-arch manifest.
What changes were proposed in this pull request?
Added multi-arch Nix build support to
DefaultImageBuilderinflytekit/image_spec/default_builder.py:platformcontains multiple architectures (e.g.,"linux/amd64,linux/arm64"), the build is treated as multi-archmyimage:tag-amd64,myimage:tag-arm64). Builds run in parallel usingThreadPoolExecutordocker buildx imagetools createthat points to both ECR imagesexist(): Addedcheck_ecr_image_architectures()function and updatedImageSpec.exist()to verify that cached images have ALL required architectures before skipping buildsThe implementation follows the pattern from the monorepo's
nix-build.yamlworkflow.Updates since last revision
ThreadPoolExecutorfor faster multi-arch builds--pushflag in Depot build command - images were being built but not pushed to ECRexist()method to detect when a single-arch image is cached instead of a proper multi-arch manifestKey review points
--pushflag is correctly placed in the Depot build command (line ~952)check_ecr_image_architectures()- relies on parsingdocker buildx imagetools inspectoutputexist()correctly identifies missing architecturesclick.sechocalls during parallel builds (output may interleave)bf5bv9t2mjis hardcoded - this is Exa-specificHow was this patch tested?
depotanddocker buildxwhich are difficult to unit test without mockingnix-multiarch-testworkflowCheck all the applicable boxes
Related PRs
Link to Devin run
https://app.devin.ai/sessions/98b686346e954acf9a44d621ec151cfd
Requested by: @Carlos-Marques