You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline.
Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon.
The agent pipeline communicates via OpenAI-compatible API (/v1/chat/completions). 4 agents hit the inference endpoint concurrently with long context requests (up to 32K tokens).
Three questions:
mlx-lm has an OpenAI-compatible server — how stable is it for sustained concurrent requests on M4 Ultra?
Any known limitations with 32K context on MLX vs vLLM?
Recommended model format for DeepSeek-R1 on Apple Silicon — MLX-converted weights vs GGUF (llama.cpp Metal backend)?
Repo for context:
github.com/trgysvc/AutonomousNativeForge
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey MLX community,
I'm building ANF (Autonomous Native Forge) — a cloud-free, 4-agent autonomous software production pipeline.
Currently running on NVIDIA Blackwell GB10 with vLLM + DeepSeek-R1-32B. Now porting to Apple Silicon.
The agent pipeline communicates via OpenAI-compatible API (/v1/chat/completions). 4 agents hit the inference endpoint concurrently with long context requests (up to 32K tokens).
Three questions:
mlx-lm has an OpenAI-compatible server — how stable is it for sustained concurrent requests on M4 Ultra?
Any known limitations with 32K context on MLX vs vLLM?
Recommended model format for DeepSeek-R1 on Apple Silicon — MLX-converted weights vs GGUF (llama.cpp Metal backend)?
Repo for context:
github.com/trgysvc/AutonomousNativeForge
Thanks
Beta Was this translation helpful? Give feedback.
All reactions