Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
-
Updated
Oct 19, 2025 - Go
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Local LLM proxy, DevOps friendly
This is a Bash script to automatically launch llama-server, detects available .gguf models, and selects GPU layers based on your free VRAM.
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.
To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."