Skip to content

fix: fail build when Metal compiler header resolution fails#3332

Open
dogukanveziroglu wants to merge 1 commit intoml-explore:mainfrom
dogukanveziroglu:fix/metal-jit-build-validation
Open

fix: fail build when Metal compiler header resolution fails#3332
dogukanveziroglu wants to merge 1 commit intoml-explore:mainfrom
dogukanveziroglu:fix/metal-jit-build-validation

Conversation

@dogukanveziroglu
Copy link
Copy Markdown

Summary

make_compiled_preamble.sh silently produces broken JIT Metal kernel sources when xcrun -sdk macosx metal fails during the build (e.g. due to misconfigured Xcode CLI tools or missing SDK path).

The script uses xcrun metal -H to resolve header dependencies, but never checks whether the command succeeded. When it fails, the error message words (e.g. "error:", "to", "not", "utility", "or",
"developer", "PATH") are parsed as header file names. This causes critical headers like bf16.h (which defines typedef bfloat bfloat16_t;) to be silently omitted from the embedded Metal kernel sources
in libmlx.dylib.

The build completes successfully, but at runtime any operation requiring JIT Metal compilation (e.g. gather_front on bfloat16 tensors) fails with:

RuntimeError: [metal::Device] Unable to build metal library from source
mlx/backend/metal/kernels/utils.h: error: unknown type name 'bfloat16_t'

This is extremely difficult to diagnose because:

  • The build reports no errors
  • Simple operations (using precompiled metallib kernels) work fine
  • The error only appears at runtime when a JIT-compiled kernel is needed
  • The root cause (missing bf16.h in the embedded source) is not visible from the error message

Fix

Add validation after the xcrun metal -H call to check that the output contains valid header dependency lines (starting with . characters). If invalid lines are detected, the build fails immediately with a
clear error message pointing to the Metal toolchain configuration.

Checklist

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

No test added — there is no existing test infrastructure for build shell scripts. The fix is a straightforward validation guard that fails early instead of silently continuing.

When xcrun metal fails during JIT source generation (e.g. due to
misconfigured Xcode CLI tools or missing SDK), the error messages
are silently parsed as header file names. This causes critical
headers like bf16.h to be omitted from the embedded Metal kernel
sources, leading to runtime bfloat16_t type errors that are
difficult to diagnose.

Add validation after the Metal compiler header resolution step to
detect malformed output and fail the build early with a clear error
message instead of producing a broken binary.
@dogukanveziroglu dogukanveziroglu force-pushed the fix/metal-jit-build-validation branch from e8800b1 to bdd12fc Compare March 29, 2026 21:28
Copy link
Copy Markdown
Collaborator

@zcbenz zcbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very useful improvement, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants