Handle Metal OOM gracefully in mlx_lm.server with structured errors by Aristide021 · Pull Request #1034 · ml-explore/mlx-lm

Aristide021 · 2026-03-21T15:18:36Z

Classify generation failures in mlx_lm.server and return structured errors instead of crashing or misreporting as 404.

Detect Metal/MLX OOM errors and map them to HTTP 503
Map other generation exceptions to HTTP 500
Return structured JSON error payloads for non-stream responses
Emit terminal SSE error event + [DONE] for stream responses
Keep server alive after OOM
Defer non-stream 200 headers until success response is ready
Add OOM regression tests (stream + non-stream) in tests/test_server.py
Document OOM behavior and mitigation knobs in mlx_lm/SERVER.md
Add extra OOM marker coverage (insufficient memory for buffer)
Log classified Metal OOM events for operator visibility

Closes #854
Refs #1015

Aware of #948 (broader memory controls); this PR is intentionally scoped to crash-to-response handling and can merge independently.

Classify generation failures in mlx_lm.server and return structured errors instead of crashing or misreporting as 404. - Detect Metal/MLX OOM errors and map them to HTTP 503 - Map other generation exceptions to HTTP 500 - Return structured JSON error payloads for non-stream responses - Emit terminal SSE error event + [DONE] for stream responses - Keep server alive after OOM - Defer non-stream 200 headers until success response is ready - Add OOM regression tests (stream + non-stream) in test_server.py - Document OOM behavior and mitigation knobs in SERVER.md

Thump604 · 2026-03-25T01:00:32Z

The OOM detection markers look correct for Apple Silicon. The main paths MLX raises on unified memory exhaustion are:

"failed to allocate" from allocator.cpp when MTLDevice allocateBuffer returns nil -- this is the most common path and you've got it covered.
"Metal error: command buffer execution failed due to out of memory" from command buffer submission failure -- also covered.

One gap: when mx.metal.set_memory_limit() is active, MLX can throw "Attempting to allocate X bytes which is greater than the maximum allowed buffer size" (from the metal::malloc limit check). The "failed to allocate" marker wouldn't match that. Worth adding "attempting to allocate" or "maximum allowed buffer size" to the marker list.

The deferred-200 pattern for non-streaming is a good fix. The streaming error path handling (pre-stream vs mid-stream headers) is also correct.

Minor: the error response includes retry_after: 30 which is a reasonable default, but memory recovery on Apple Silicon (unified memory, no separate GPU eviction) really depends on whether other processes release memory. A shorter default (5-10s) might give a better user experience for transient spikes.

- Add marker coverage for 'attempting to allocate' and 'maximum allowed buffer size' - Add regression test to ensure these variants map to HTTP 503

Aristide021 added 3 commits March 21, 2026 11:07

Format server OOM handling with pre-commit black

06ce874

Broaden OOM markers and log classified Metal OOMs

4713cfb

Catch additional Metal OOM error strings

53dc99e

- Add marker coverage for 'attempting to allocate' and 'maximum allowed buffer size' - Add regression test to ensure these variants map to HTTP 503

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Metal OOM gracefully in mlx_lm.server with structured errors#1034

Handle Metal OOM gracefully in mlx_lm.server with structured errors#1034
Aristide021 wants to merge 4 commits intoml-explore:mainfrom
Aristide021:server-oom-hardening

Aristide021 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aristide021 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aristide021 commented Mar 21, 2026 •

edited

Loading