-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
I ran into this while trying to write an end-to-end regression for a Metal conv_general output-offset bug.
Reproducer
import mlx.core as mx
n = (2**31) // (64 * 64) + 2
x = mx.ones((n, 8, 8, 1), dtype=mx.float16)
w = mx.ones((1, 1, 1, 1), dtype=mx.float16)
y = mx.conv_general(x, w, input_dilation=(9, 9))
mx.eval(y)Output shape: (524290, 64, 64, 1), total elements: 2147491840.
Expected
Either the array evaluates successfully, or MLX raises a clean size-limit error with the true requested size.
Actual
mx.eval(y) fails with a wrapped allocation size:
RuntimeError: [metal::malloc] Attempting to allocate 18446744069414600704 bytes which is greater than the maximum allowed buffer size of 86586540032 bytes.
18446744069414600704 is 2^64 - 4294950912 consistent with signed-to-unsigned overflow in size bookkeeping.
Related failures from the same y
mx.reshape(y, (-1,)) # ValueError: [reshape] Cannot reshape array of size 2147491840 into shape (-2147475456).
mx.take(y, mx.array([0], dtype=mx.uint32)) # ValueError: [gather] Slice sizes must be in [0, a.shape(i)]. Got (1) for array with shape (-2147475456).Lazy slice y[-1] reports the correct shape (64, 64, 1), but mx.eval(y[-1]) triggers the same wrapped metal::malloc error.
The inferred shape -2147475456 is 2147491840 interpreted as signed int32 confirming overflow is in shape/size accounting, not just the allocator.
Boundary
Verified on mlx 0.31.1 and local 0.31.2.dev20260325+84099a14b:
| output elements | reshape | eval | slice materialization |
|---|---|---|---|
| 2147479552 (2³¹ - 4096, n=524287) | ✓ | ✓ | ✓ |
| 2147483648 (exactly 2³¹, n=524288) | ✗ negative shape | ✗ wrapped alloc | ✗ wrapped alloc |
| 2147491840 (reproducer, n=524290) | ✗ negative shape | ✗ wrapped alloc | ✗ wrapped alloc |
The overflow triggers at exactly 2^31 elements. The last passing case 2^31 - 4096 falls naturally out of the (n, 64, 64, 1) shape grid.
Root cause
The exact 2^31 boundary is consistent with ShapeElem = int32_t in mlx/array.h. The wrapped allocation and negative shapes appear to be downstream symptoms of shape/size overflow at the host level before Metal is involved. Additional affected paths likely exist.
Environment
- Device: Apple M3 Max
- Memory: 137438953472 bytes
- Max buffer length: 86586540032 bytes
- Reproduced three times in independent fresh Python processes with identical results
- Reproduced on both installed wheel (
mlx 0.31.1) and local main build