Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment
retrieval encoder inference embeddings ner serving rag colbert ml-infra-deployments vllm gliner multimodal-rag ai-infrastructure colpali triton-kernels vllm-plugins
-
Updated
Apr 27, 2026 - Python