🎯
Focusing
Popular repositories Loading
-
turboquant-mlx
turboquant-mlx PublicTurboQuant KV cache compression for MLX with fused Metal kernels. 4.6x compression at 98% FP16 speed.
-
-
-
-
vllm-mlx
vllm-mlx PublicForked from waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX …
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.


