Production Case Study: Qwen 3.5 VLM on MLX for Healthcare AI #3195

asq-sheriff · 2026-03-03T15:34:50Z

asq-sheriff
Mar 3, 2026

We deployed Qwen 3.5-4B on Apple Silicon for elderly care AI using mlx-vlm.
Key findings:

• mlx-vlm required (not mlx-lm) due to VLM architecture
• 3x latency improvement over llama.cpp for DeltaNet (20.7s → 6.9s)
• Serial queuing outperformed continuous batching for our conversational workload
• Patched chat template to disable thinking mode by default

This feedback may help with future MLX optimizations for DeltaNet architectures.
Happy to share more details about our deployment.

[Medium Link]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production Case Study: Qwen 3.5 VLM on MLX for Healthcare AI #3195

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Production Case Study: Qwen 3.5 VLM on MLX for Healthcare AI #3195

Uh oh!

asq-sheriff Mar 3, 2026

Replies: 0 comments

asq-sheriff
Mar 3, 2026