Skip to content

SIGSEGV in Compiled::eval_gpu when auto-fused kernel receives lazy/mmap'd array with null MTLBuffer #3329

@FHult

Description

@FHult

Description

MLX's automatic element-wise kernel fusion (Compiled kernels) crashes with SIGSEGV (KERN_INVALID_ADDRESS at 0x0) when any input array has a null MTLBuffer. This happens when weights are loaded via mmap (lazy loading) and MLX auto-fuses an element-wise operation that includes those weights before they have been promoted to a Metal buffer.

The crash is inside CommandEncoder::set_input_array, which dereferences the Metal buffer pointer without checking for null.

Environment

  • MLX version: 0.31.1
  • macOS: 26.4 (Darwin 25.4.0)
  • Hardware: Apple M1 Max (MacBookPro18,2)
  • Python: 3.14.3

Stack trace (from macOS crash reporter)

Thread 0 Crashed:
0   libmlx.dylib    mlx::core::metal::CommandEncoder::set_input_array(mlx::core::array const&, int, long long) + 48
1   libmlx.dylib    mlx::core::Compiled::eval_gpu(std::vector<mlx::core::array> const&, std::vector<mlx::core::array>&) + 1032
2   libmlx.dylib    mlx::core::gpu::eval(mlx::core::array&) + 204
3   libmlx.dylib    mlx::core::eval_impl(std::vector<mlx::core::array>, bool) + 5020
4   libmlx.dylib    mlx::core::async_eval(std::vector<mlx::core::array>) + 108
5   core.cpython-314-darwin.so    [mlx Python binding for async_eval]

Exception: EXC_BAD_ACCESS (SIGSEGV)KERN_INVALID_ADDRESS at 0x0000000000000000

Background threads at crash time are in ParallelFileReader::readLoad::eval_cpu (actively reading mmap'd weights from disk via pread).

Reproduction scenario

The crash happens when:

  1. Weights are loaded from a .safetensors file using MLX's mmap/lazy loading (ParallelFileReader)
  2. mx.eval(weights) is called to pre-materialize them (this appears to complete without error)
  3. A forward pass is run through the model — the automatic element-wise fusion creates Compiled kernels
  4. When those kernels execute, set_input_array receives an array whose MTLBuffer is still null

Root cause

MLX's auto-compilation fuses chains of element-wise operations into Compiled Metal kernels. Compiled::eval_gpu calls CommandEncoder::set_input_array for each input, which reads array.buffer().ptr() directly without checking for null. If any input array has a null MTLBuffer (e.g., because it was lazy/mmap'd and its Metal buffer was not yet allocated), this dereferences a null pointer.

The surprising part is that calling mx.eval(weights) beforehand does not reliably prevent the crash — the MTLBuffer appears to be null in the context of the compiled kernel's execution even after eval.

Workaround

mx.disable_compile()

This prevents auto-fusion and the creation of Compiled kernels entirely, so set_input_array is never called with lazy arrays.

Expected behavior

Compiled::eval_gpu should either:

  1. Check for null MTLBuffer and trigger a synchronous eval/promotion of the input array before proceeding, or
  2. Fall back to a non-compiled evaluation path

Or, mx.eval(array) should guarantee that all subsequent GPU operations on that array (including as inputs to Compiled kernels) have a valid MTLBuffer.

Notes

  • mflux already works around a related issue in its flux2_klein.py via AppleSiliconUtil.is_m1_or_m2() — it avoids mx.compile(predict) on M1/M2, but inadvertently excludes M1 Max/Ultra (which contain "max" or "ultra" in the chip name) and thus still hits this crash on those chips.
  • The auto-compilation version of this bug (not explicit mx.compile) was observed with MLX 0.31.1, which was believed to have fixed an earlier explicit-compile version of the same crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions