-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Description
MLX's automatic element-wise kernel fusion (Compiled kernels) crashes with SIGSEGV (KERN_INVALID_ADDRESS at 0x0) when any input array has a null MTLBuffer. This happens when weights are loaded via mmap (lazy loading) and MLX auto-fuses an element-wise operation that includes those weights before they have been promoted to a Metal buffer.
The crash is inside CommandEncoder::set_input_array, which dereferences the Metal buffer pointer without checking for null.
Environment
- MLX version: 0.31.1
- macOS: 26.4 (Darwin 25.4.0)
- Hardware: Apple M1 Max (MacBookPro18,2)
- Python: 3.14.3
Stack trace (from macOS crash reporter)
Thread 0 Crashed:
0 libmlx.dylib mlx::core::metal::CommandEncoder::set_input_array(mlx::core::array const&, int, long long) + 48
1 libmlx.dylib mlx::core::Compiled::eval_gpu(std::vector<mlx::core::array> const&, std::vector<mlx::core::array>&) + 1032
2 libmlx.dylib mlx::core::gpu::eval(mlx::core::array&) + 204
3 libmlx.dylib mlx::core::eval_impl(std::vector<mlx::core::array>, bool) + 5020
4 libmlx.dylib mlx::core::async_eval(std::vector<mlx::core::array>) + 108
5 core.cpython-314-darwin.so [mlx Python binding for async_eval]
Exception: EXC_BAD_ACCESS (SIGSEGV) — KERN_INVALID_ADDRESS at 0x0000000000000000
Background threads at crash time are in ParallelFileReader::read → Load::eval_cpu (actively reading mmap'd weights from disk via pread).
Reproduction scenario
The crash happens when:
- Weights are loaded from a
.safetensorsfile using MLX's mmap/lazy loading (ParallelFileReader) mx.eval(weights)is called to pre-materialize them (this appears to complete without error)- A forward pass is run through the model — the automatic element-wise fusion creates
Compiledkernels - When those kernels execute,
set_input_arrayreceives an array whoseMTLBufferis still null
Root cause
MLX's auto-compilation fuses chains of element-wise operations into Compiled Metal kernels. Compiled::eval_gpu calls CommandEncoder::set_input_array for each input, which reads array.buffer().ptr() directly without checking for null. If any input array has a null MTLBuffer (e.g., because it was lazy/mmap'd and its Metal buffer was not yet allocated), this dereferences a null pointer.
The surprising part is that calling mx.eval(weights) beforehand does not reliably prevent the crash — the MTLBuffer appears to be null in the context of the compiled kernel's execution even after eval.
Workaround
mx.disable_compile()This prevents auto-fusion and the creation of Compiled kernels entirely, so set_input_array is never called with lazy arrays.
Expected behavior
Compiled::eval_gpu should either:
- Check for null
MTLBufferand trigger a synchronous eval/promotion of the input array before proceeding, or - Fall back to a non-compiled evaluation path
Or, mx.eval(array) should guarantee that all subsequent GPU operations on that array (including as inputs to Compiled kernels) have a valid MTLBuffer.
Notes
mfluxalready works around a related issue in itsflux2_klein.pyviaAppleSiliconUtil.is_m1_or_m2()— it avoidsmx.compile(predict)on M1/M2, but inadvertently excludes M1 Max/Ultra (which contain "max" or "ultra" in the chip name) and thus still hits this crash on those chips.- The auto-compilation version of this bug (not explicit
mx.compile) was observed with MLX 0.31.1, which was believed to have fixed an earlier explicit-compile version of the same crash.