From 19fd13a82c76ba1d1215739d5a14b33cc3702749 Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 09:06:56 +0800 Subject: [PATCH 1/9] Fix healthcheck curl missing & Implement testing profile Signed-off-by: JaredforReal --- .github/copilot-instructions.md | 88 ++++++++++++++++++++++++++++++++ Dockerfile.extproc | 10 +++- config/config.testing.yaml | 84 ++++++++++++++++++++++++++++++ docker-compose.yml | 19 +++++++ scripts/entrypoint.sh | 12 +++++ tools/mock-vllm/Dockerfile | 16 ++++++ tools/mock-vllm/README.md | 9 ++++ tools/mock-vllm/app.py | 45 ++++++++++++++++ tools/mock-vllm/requirements.txt | 3 ++ 9 files changed, 285 insertions(+), 1 deletion(-) create mode 100644 .github/copilot-instructions.md create mode 100644 config/config.testing.yaml create mode 100644 scripts/entrypoint.sh create mode 100644 tools/mock-vllm/Dockerfile create mode 100644 tools/mock-vllm/README.md create mode 100644 tools/mock-vllm/app.py create mode 100644 tools/mock-vllm/requirements.txt diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..931ecbde --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,88 @@ +# Copilot Agent Instructions — vLLM Semantic Router + +Purpose: help AI coding agents work effectively in this repo by knowing the architecture, conventions, and non-obvious workflows. + +## Big picture + +- This is a Mixture-of-Models router for LLM requests with: Envoy External Processing (gRPC) for request routing, classification (intent/PII/security), semantic similarity caching, and tool auto-selection. +- Primary implementation is Go with a Rust ML binding (HuggingFace Candle) via CGO for embeddings/similarity. A small HTTP Classification API is exposed alongside the gRPC extproc server. + +## Core components (key files) + +- Entry point: `src/semantic-router/cmd/main.go` (starts gRPC extproc, Classification API, and Prometheus metrics) +- Envoy ExtProc server: `src/semantic-router/pkg/extproc/` (stream handlers, routing logic, request/response transforms) +- Configuration: `config/config.yaml` (routing categories, model_config, reasoning families, semantic cache backend, vLLM endpoints, tools DB, classifiers) +- Classification API: `src/semantic-router/pkg/api/server.go` (e.g., POST `/api/v1/classify/intent|pii|security|batch`) +- Config loader/utilities: `src/semantic-router/pkg/config/` (hot-reload support, endpoint selection, policy helpers) +- Cache backends: `src/semantic-router/pkg/cache/` (in-memory or Milvus; compile-time tag `milvus`) +- Tools database: `src/semantic-router/pkg/tools/` (semantic tool selection) +- Candle Rust binding (CGO): `candle-binding/` (builds native lib used for similarity) +- Tests: Go unit/integration under `src/semantic-router/pkg/**`, e2e in `e2e-tests/`, research/bench suite in `bench/` + +## How things talk to each other + +1. Client → Envoy → gRPC ExtProc (`extproc.Server`) → Router selects model/tools/reasoning and edits OpenAI-compatible request → forwards to chosen vLLM endpoint. +2. Router uses Candle embeddings for similarity cache and tool selection. +3. Classification uses either legacy ModernBERT models or auto-discovered LoRA unified classifiers (services initialize a global ClassificationService). +4. Config changes are hot-reloaded (fsnotify) without restarting the gRPC server. + +## Build / run workflows (non-obvious bits) + +- Makefile orchestrates sub-makefiles under `tools/make/` + - Build router (also builds Rust lib): `make build-router` + - Run router with config: `CONFIG_FILE=config/config.yaml make run-router` + - Run Envoy (installs func-e if missing): `make run-envoy` + - Download local models from HF Hub: `make download-models` (uses `hf download` CLI) +- Dynamic library path on macOS: prefer `DYLD_LIBRARY_PATH` to point to `candle-binding/target/release`; Linux uses `LD_LIBRARY_PATH`. The Makefile sets `LD_LIBRARY_PATH`—on macOS set `DYLD_LIBRARY_PATH` in zsh if needed. +- Ports: gRPC extproc `:50051` (flag `-port`), Classification API `:8080` (`-api-port`), Prometheus `:9190` (`-metrics-port`). +- Docker: `docker-compose.yml` spins up router + Envoy (+ optional testing profile). + +Example (zsh): + +```sh +# Build native lib + router +make build-router + +# If macOS, ensure Candle dylib is discoverable for CGO +export DYLD_LIBRARY_PATH="$PWD/candle-binding/target/release:$DYLD_LIBRARY_PATH" + +# Run router with the default config and metrics +CONFIG_FILE=config/config.yaml make run-router + +# Run Envoy (separate terminal) +make run-envoy +``` + +## Configuration patterns (edit `config/config.yaml`) + +- `categories[]` with per-category `model_scores` and reasoning flags drive model selection; `default_model` is the fallback. +- `model_config` + `reasoning_families` normalize “reasoning mode” syntax across model families (e.g., deepseek, qwen3, gpt-oss). Use `GetModelReasoningFamily()` helpers, don’t hardcode. +- `semantic_cache`: `backend_type: memory|milvus`, `similarity_threshold`, `ttl_seconds`. For Milvus, run `make start-milvus` and test with `-tags=milvus`. +- `tools`: enable semantic tool selection via `tools_db_path` (JSON), `top_k`, and threshold (defaults to BERT threshold if unset). +- `classifier`: paths to ModernBERT/LoRA models and mapping jsons; batch endpoint requires unified classifier to be available. +- `vllm_endpoints[]`: list models per endpoint; selection respects per-model `preferred_endpoints` and weights. + +## Testing + +- Go vet and tidy: `make vet` and `make check-go-mod-tidy` +- Unit tests (Go): `make test-semantic-router` (set `SKIP_MILVUS_TESTS=false` to include Milvus) or `go test -v ./...` under `src/semantic-router` +- Milvus-specific: `make test-milvus-cache` or `make test-semantic-router-milvus` (uses `-tags=milvus`) +- E2E Python tests: see `e2e-tests/README.md` (requires router+envoy running) +- Quick cURL demos: `make test-auto-prompt-reasoning`, `test-pii`, `test-tools` (hits Envoy at `http://localhost:8801/v1/chat/completions` with `model: "auto"`) + +## Conventions & tips for contributors (agents) + +- Use config accessors from `pkg/config` (e.g., endpoint selection, PII policies). Avoid duplicating selection logic. +- Prefer `services.*ClassificationService` APIs for classification; a global service may be set by auto-discovery. +- Respect streaming in ExtProc handlers and record metrics via `pkg/metrics`. +- Keep hot-reload safe: re-create `OpenAIRouter` on config changes using `Server.watchConfigAndReload` pattern. +- When adding cache/tool logic, use existing interfaces: `cache.CacheBackend`, `tools.ToolsDatabase`. + +References + +- Router main: `src/semantic-router/cmd/main.go` +- ExtProc: `src/semantic-router/pkg/extproc/` +- Config: `config/config.yaml`, helpers in `src/semantic-router/pkg/config/` +- Candle binding: `candle-binding/` +- Bench: `bench/` (CLI and plots) +- Docs site: `website/` (Docusaurus) diff --git a/Dockerfile.extproc b/Dockerfile.extproc index 1ba8b45e..5925d00c 100644 --- a/Dockerfile.extproc +++ b/Dockerfile.extproc @@ -54,5 +54,13 @@ COPY config/config.yaml /app/config/ ENV LD_LIBRARY_PATH=/app/lib EXPOSE 50051 +# Install curl for healthchecks and basic diagnostics +RUN dnf -y update && \ + dnf -y install curl && \ + dnf clean all -CMD ["/app/extproc-server", "--config", "/app/config/config.yaml"] +# Copy entrypoint to allow switching config via env var CONFIG_FILE +COPY scripts/entrypoint.sh /app/entrypoint.sh +RUN chmod +x /app/entrypoint.sh + +ENTRYPOINT ["/app/entrypoint.sh"] diff --git a/config/config.testing.yaml b/config/config.testing.yaml new file mode 100644 index 00000000..0b84e0ff --- /dev/null +++ b/config/config.testing.yaml @@ -0,0 +1,84 @@ +bert_model: + model_id: sentence-transformers/all-MiniLM-L12-v2 + threshold: 0.6 + use_cpu: true + +semantic_cache: + enabled: true + backend_type: "memory" + similarity_threshold: 0.8 + max_entries: 1000 + ttl_seconds: 3600 + eviction_policy: "fifo" + +tools: + enabled: true + top_k: 3 + similarity_threshold: 0.2 + tools_db_path: "config/tools_db.json" + fallback_to_empty: true + +prompt_guard: + enabled: true + use_modernbert: true + model_id: "models/jailbreak_classifier_modernbert-base_model" + threshold: 0.7 + use_cpu: true + jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json" + +vllm_endpoints: + - name: "mock" + address: "mock-vllm" + port: 8000 + models: + - "openai/gpt-oss-20b" + weight: 1 + health_check_path: "/health" + +model_config: + "openai/gpt-oss-20b": + reasoning_family: "gpt-oss" + preferred_endpoints: ["mock"] + pii_policy: + allow_by_default: true + +categories: + - name: other + model_scores: + - model: openai/gpt-oss-20b + score: 0.7 + use_reasoning: false + +default_model: openai/gpt-oss-20b + +reasoning_families: + deepseek: + type: "chat_template_kwargs" + parameter: "thinking" + + qwen3: + type: "chat_template_kwargs" + parameter: "enable_thinking" + + gpt-oss: + type: "reasoning_effort" + parameter: "reasoning_effort" + gpt: + type: "reasoning_effort" + parameter: "reasoning_effort" + +default_reasoning_effort: high + +api: + batch_classification: + max_batch_size: 100 + concurrency_threshold: 5 + max_concurrency: 8 + metrics: + enabled: true + detailed_goroutine_tracking: true + high_resolution_timing: false + sample_rate: 1.0 + duration_buckets: + [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30] + size_buckets: [1, 2, 5, 10, 20, 50, 100, 200] diff --git a/docker-compose.yml b/docker-compose.yml index 09f7b9ad..afc7e7e1 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -13,6 +13,7 @@ services: - ./models:/app/models:ro environment: - LD_LIBRARY_PATH=/app/lib + - CONFIG_FILE=${CONFIG_FILE:-/app/config/config.yaml} networks: - semantic-network healthcheck: @@ -44,6 +45,24 @@ services: retries: 5 start_period: 10s + # Mock vLLM service for testing profile + mock-vllm: + build: + context: ./tools/mock-vllm + dockerfile: Dockerfile + container_name: mock-vllm + profiles: ["testing"] + ports: + - "8000:8000" + networks: + - semantic-network + healthcheck: + test: ["CMD", "curl", "-fsS", "http://localhost:8000/health"] + interval: 10s + timeout: 5s + retries: 5 + start_period: 5s + networks: semantic-network: driver: bridge diff --git a/scripts/entrypoint.sh b/scripts/entrypoint.sh new file mode 100644 index 00000000..c0b4093a --- /dev/null +++ b/scripts/entrypoint.sh @@ -0,0 +1,12 @@ +#!/usr/bin/env bash +set -euo pipefail + +CONFIG_FILE_PATH=${CONFIG_FILE:-/app/config/config.yaml} + +if [[ ! -f "$CONFIG_FILE_PATH" ]]; then + echo "[entrypoint] Config file not found at $CONFIG_FILE_PATH" >&2 + exit 1 +fi + +echo "[entrypoint] Starting semantic-router with config: $CONFIG_FILE_PATH" +exec /app/extproc-server --config "$CONFIG_FILE_PATH" diff --git a/tools/mock-vllm/Dockerfile b/tools/mock-vllm/Dockerfile new file mode 100644 index 00000000..a3287059 --- /dev/null +++ b/tools/mock-vllm/Dockerfile @@ -0,0 +1,16 @@ +FROM python:3.11-slim + +WORKDIR /app + +RUN apt-get update && apt-get install -y --no-install-recommends \ + curl \ + && rm -rf /var/lib/apt/lists/* + +COPY requirements.txt /app/requirements.txt +RUN pip install --no-cache-dir -r /app/requirements.txt + +COPY app.py /app/app.py + +EXPOSE 8000 + +CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] diff --git a/tools/mock-vllm/README.md b/tools/mock-vllm/README.md new file mode 100644 index 00000000..1ac7a9b8 --- /dev/null +++ b/tools/mock-vllm/README.md @@ -0,0 +1,9 @@ +# Mock vLLM (OpenAI-compatible) service + +A tiny FastAPI server that emulates minimal endpoints used by the router: + +- GET /health +- GET /v1/models +- POST /v1/chat/completions + +Intended for local testing with Docker Compose profile `testing`. diff --git a/tools/mock-vllm/app.py b/tools/mock-vllm/app.py new file mode 100644 index 00000000..c991c76f --- /dev/null +++ b/tools/mock-vllm/app.py @@ -0,0 +1,45 @@ +from fastapi import FastAPI +from pydantic import BaseModel +from typing import List, Optional + +app = FastAPI() + + +class ChatMessage(BaseModel): + role: str + content: str + + +class ChatRequest(BaseModel): + model: str + messages: List[ChatMessage] + temperature: Optional[float] = 0.2 + + +@app.get("/health") +async def health(): + return {"status": "ok"} + + +@app.get("/v1/models") +async def models(): + return {"data": [{"id": "openai/gpt-oss-20b", "object": "model"}]} + + +@app.post("/v1/chat/completions") +async def chat_completions(req: ChatRequest): + # Very simple echo-like behavior + last_user = next((m.content for m in reversed(req.messages) if m.role == "user"), "") + content = f"[mock-{req.model}] You said: {last_user}" + return { + "id": "cmpl-mock-123", + "object": "chat.completion", + "model": req.model, + "choices": [ + { + "index": 0, + "message": {"role": "assistant", "content": content}, + "finish_reason": "stop", + } + ], + } diff --git a/tools/mock-vllm/requirements.txt b/tools/mock-vllm/requirements.txt new file mode 100644 index 00000000..3971515d --- /dev/null +++ b/tools/mock-vllm/requirements.txt @@ -0,0 +1,3 @@ +fastapi==0.115.0 +uvicorn==0.30.6 +pydantic==2.9.2 From 61fb53a3a499229e86cc508ae8f16117ebf02bdd Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 10:47:38 +0800 Subject: [PATCH 2/9] fix pre-commit error Signed-off-by: JaredforReal --- .github/copilot-instructions.md | 88 --------------------------------- tools/mock-vllm/app.py | 7 ++- 2 files changed, 5 insertions(+), 90 deletions(-) delete mode 100644 .github/copilot-instructions.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md deleted file mode 100644 index 931ecbde..00000000 --- a/.github/copilot-instructions.md +++ /dev/null @@ -1,88 +0,0 @@ -# Copilot Agent Instructions — vLLM Semantic Router - -Purpose: help AI coding agents work effectively in this repo by knowing the architecture, conventions, and non-obvious workflows. - -## Big picture - -- This is a Mixture-of-Models router for LLM requests with: Envoy External Processing (gRPC) for request routing, classification (intent/PII/security), semantic similarity caching, and tool auto-selection. -- Primary implementation is Go with a Rust ML binding (HuggingFace Candle) via CGO for embeddings/similarity. A small HTTP Classification API is exposed alongside the gRPC extproc server. - -## Core components (key files) - -- Entry point: `src/semantic-router/cmd/main.go` (starts gRPC extproc, Classification API, and Prometheus metrics) -- Envoy ExtProc server: `src/semantic-router/pkg/extproc/` (stream handlers, routing logic, request/response transforms) -- Configuration: `config/config.yaml` (routing categories, model_config, reasoning families, semantic cache backend, vLLM endpoints, tools DB, classifiers) -- Classification API: `src/semantic-router/pkg/api/server.go` (e.g., POST `/api/v1/classify/intent|pii|security|batch`) -- Config loader/utilities: `src/semantic-router/pkg/config/` (hot-reload support, endpoint selection, policy helpers) -- Cache backends: `src/semantic-router/pkg/cache/` (in-memory or Milvus; compile-time tag `milvus`) -- Tools database: `src/semantic-router/pkg/tools/` (semantic tool selection) -- Candle Rust binding (CGO): `candle-binding/` (builds native lib used for similarity) -- Tests: Go unit/integration under `src/semantic-router/pkg/**`, e2e in `e2e-tests/`, research/bench suite in `bench/` - -## How things talk to each other - -1. Client → Envoy → gRPC ExtProc (`extproc.Server`) → Router selects model/tools/reasoning and edits OpenAI-compatible request → forwards to chosen vLLM endpoint. -2. Router uses Candle embeddings for similarity cache and tool selection. -3. Classification uses either legacy ModernBERT models or auto-discovered LoRA unified classifiers (services initialize a global ClassificationService). -4. Config changes are hot-reloaded (fsnotify) without restarting the gRPC server. - -## Build / run workflows (non-obvious bits) - -- Makefile orchestrates sub-makefiles under `tools/make/` - - Build router (also builds Rust lib): `make build-router` - - Run router with config: `CONFIG_FILE=config/config.yaml make run-router` - - Run Envoy (installs func-e if missing): `make run-envoy` - - Download local models from HF Hub: `make download-models` (uses `hf download` CLI) -- Dynamic library path on macOS: prefer `DYLD_LIBRARY_PATH` to point to `candle-binding/target/release`; Linux uses `LD_LIBRARY_PATH`. The Makefile sets `LD_LIBRARY_PATH`—on macOS set `DYLD_LIBRARY_PATH` in zsh if needed. -- Ports: gRPC extproc `:50051` (flag `-port`), Classification API `:8080` (`-api-port`), Prometheus `:9190` (`-metrics-port`). -- Docker: `docker-compose.yml` spins up router + Envoy (+ optional testing profile). - -Example (zsh): - -```sh -# Build native lib + router -make build-router - -# If macOS, ensure Candle dylib is discoverable for CGO -export DYLD_LIBRARY_PATH="$PWD/candle-binding/target/release:$DYLD_LIBRARY_PATH" - -# Run router with the default config and metrics -CONFIG_FILE=config/config.yaml make run-router - -# Run Envoy (separate terminal) -make run-envoy -``` - -## Configuration patterns (edit `config/config.yaml`) - -- `categories[]` with per-category `model_scores` and reasoning flags drive model selection; `default_model` is the fallback. -- `model_config` + `reasoning_families` normalize “reasoning mode” syntax across model families (e.g., deepseek, qwen3, gpt-oss). Use `GetModelReasoningFamily()` helpers, don’t hardcode. -- `semantic_cache`: `backend_type: memory|milvus`, `similarity_threshold`, `ttl_seconds`. For Milvus, run `make start-milvus` and test with `-tags=milvus`. -- `tools`: enable semantic tool selection via `tools_db_path` (JSON), `top_k`, and threshold (defaults to BERT threshold if unset). -- `classifier`: paths to ModernBERT/LoRA models and mapping jsons; batch endpoint requires unified classifier to be available. -- `vllm_endpoints[]`: list models per endpoint; selection respects per-model `preferred_endpoints` and weights. - -## Testing - -- Go vet and tidy: `make vet` and `make check-go-mod-tidy` -- Unit tests (Go): `make test-semantic-router` (set `SKIP_MILVUS_TESTS=false` to include Milvus) or `go test -v ./...` under `src/semantic-router` -- Milvus-specific: `make test-milvus-cache` or `make test-semantic-router-milvus` (uses `-tags=milvus`) -- E2E Python tests: see `e2e-tests/README.md` (requires router+envoy running) -- Quick cURL demos: `make test-auto-prompt-reasoning`, `test-pii`, `test-tools` (hits Envoy at `http://localhost:8801/v1/chat/completions` with `model: "auto"`) - -## Conventions & tips for contributors (agents) - -- Use config accessors from `pkg/config` (e.g., endpoint selection, PII policies). Avoid duplicating selection logic. -- Prefer `services.*ClassificationService` APIs for classification; a global service may be set by auto-discovery. -- Respect streaming in ExtProc handlers and record metrics via `pkg/metrics`. -- Keep hot-reload safe: re-create `OpenAIRouter` on config changes using `Server.watchConfigAndReload` pattern. -- When adding cache/tool logic, use existing interfaces: `cache.CacheBackend`, `tools.ToolsDatabase`. - -References - -- Router main: `src/semantic-router/cmd/main.go` -- ExtProc: `src/semantic-router/pkg/extproc/` -- Config: `config/config.yaml`, helpers in `src/semantic-router/pkg/config/` -- Candle binding: `candle-binding/` -- Bench: `bench/` (CLI and plots) -- Docs site: `website/` (Docusaurus) diff --git a/tools/mock-vllm/app.py b/tools/mock-vllm/app.py index c991c76f..c806f961 100644 --- a/tools/mock-vllm/app.py +++ b/tools/mock-vllm/app.py @@ -1,6 +1,7 @@ +from typing import List, Optional + from fastapi import FastAPI from pydantic import BaseModel -from typing import List, Optional app = FastAPI() @@ -29,7 +30,9 @@ async def models(): @app.post("/v1/chat/completions") async def chat_completions(req: ChatRequest): # Very simple echo-like behavior - last_user = next((m.content for m in reversed(req.messages) if m.role == "user"), "") + last_user = next( + (m.content for m in reversed(req.messages) if m.role == "user"), "" + ) content = f"[mock-{req.model}] You said: {last_user}" return { "id": "cmpl-mock-123", From 46867844fb45a6e9081b8103cbec63bdb1dde5ed Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 11:11:59 +0800 Subject: [PATCH 3/9] Added usage fields and metadata to chat_completions Signed-off-by: JaredforReal --- tools/mock-vllm/app.py | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/tools/mock-vllm/app.py b/tools/mock-vllm/app.py index c806f961..e4d02d15 100644 --- a/tools/mock-vllm/app.py +++ b/tools/mock-vllm/app.py @@ -1,3 +1,5 @@ +import math +import time from typing import List, Optional from fastapi import FastAPI @@ -34,15 +36,46 @@ async def chat_completions(req: ChatRequest): (m.content for m in reversed(req.messages) if m.role == "user"), "" ) content = f"[mock-{req.model}] You said: {last_user}" + + # Rough token estimation: ~1 token per 4 characters (ceil) + def estimate_tokens(text: str) -> int: + if not text: + return 0 + return max(1, math.ceil(len(text) / 4)) + + prompt_text = "\n".join( + m.content for m in req.messages if isinstance(m.content, str) + ) + prompt_tokens = estimate_tokens(prompt_text) + completion_tokens = estimate_tokens(content) + total_tokens = prompt_tokens + completion_tokens + + created_ts = int(time.time()) + + usage = { + "prompt_tokens": prompt_tokens, + "completion_tokens": completion_tokens, + "total_tokens": total_tokens, + # Optional details fields some clients read when using caching/reasoning + "prompt_tokens_details": {"cached_tokens": 0}, + "completion_tokens_details": {"reasoning_tokens": 0}, + } + return { "id": "cmpl-mock-123", "object": "chat.completion", + "created": created_ts, "model": req.model, + "system_fingerprint": "mock-vllm", "choices": [ { "index": 0, "message": {"role": "assistant", "content": content}, "finish_reason": "stop", + "logprobs": None, } ], + "usage": usage, + # Some SDKs look for token_usage; keep it as an alias for convenience. + "token_usage": usage, } From f8a1703ec764a4a7007b39f8afb33dc59aa280fc Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 15:21:10 +0800 Subject: [PATCH 4/9] remove curl install & add mirrors for CN users Signed-off-by: JaredforReal --- Dockerfile.extproc | 13 +++++++++---- docker-compose.yml | 5 +++++ tools/mock-vllm/Dockerfile | 4 ++++ 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Dockerfile.extproc b/Dockerfile.extproc index 5925d00c..89e66ada 100644 --- a/Dockerfile.extproc +++ b/Dockerfile.extproc @@ -24,11 +24,20 @@ FROM golang:1.24 as go-builder WORKDIR /app +# Use China-friendly Go module mirrors to avoid proxy.golang.org timeouts +ENV GOPROXY=https://goproxy.cn,direct +# Prefer a reachable checksum database in CN (or set to 'off' if still blocked) +ENV GOSUMDB=sum.golang.google.cn + # Copy Go module files first for better layer caching RUN mkdir -p src/semantic-router COPY src/semantic-router/go.mod src/semantic-router/go.sum src/semantic-router/ COPY candle-binding/go.mod candle-binding/semantic-router.go candle-binding/ +# Pre-download Go modules to leverage Docker layer caching and fail fast if mirrors are unreachable +RUN cd src/semantic-router && go mod download && \ + cd /app/candle-binding && go mod download + # Copy semantic-router source code COPY src/semantic-router/ src/semantic-router/ @@ -54,10 +63,6 @@ COPY config/config.yaml /app/config/ ENV LD_LIBRARY_PATH=/app/lib EXPOSE 50051 -# Install curl for healthchecks and basic diagnostics -RUN dnf -y update && \ - dnf -y install curl && \ - dnf clean all # Copy entrypoint to allow switching config via env var CONFIG_FILE COPY scripts/entrypoint.sh /app/entrypoint.sh diff --git a/docker-compose.yml b/docker-compose.yml index afc7e7e1..7f38cab4 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -11,9 +11,14 @@ services: volumes: - ./config:/app/config:ro - ./models:/app/models:ro + # - ~/.cache/huggingface:/root/.cache/huggingface # uncomment to persist Hugging Face cache on host (CN users) environment: - LD_LIBRARY_PATH=/app/lib - CONFIG_FILE=${CONFIG_FILE:-/app/config/config.yaml} + # The following environment variables help CN mainland users download Hugging Face models via mirrors + # - HF_HUB_ENABLE_HF_TRANSFER=1 # uncomment to enable fast transfer for HF downloads (CN users) + # - HF_ENDPOINT=https://hf-mirror.com # uncomment to use HF mirror endpoint in China + # - HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface # uncomment to set HF cache directory (works with volume above) networks: - semantic-network healthcheck: diff --git a/tools/mock-vllm/Dockerfile b/tools/mock-vllm/Dockerfile index a3287059..3a7e812c 100644 --- a/tools/mock-vllm/Dockerfile +++ b/tools/mock-vllm/Dockerfile @@ -6,6 +6,10 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ curl \ && rm -rf /var/lib/apt/lists/* +# Uncomment to Configure pip to use a China mirror for faster installs +# RUN python -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ +# python -m pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn + COPY requirements.txt /app/requirements.txt RUN pip install --no-cache-dir -r /app/requirements.txt From 8ba24aee8ec88d2a178fef220ec6a58adadbbe36 Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 16:04:29 +0800 Subject: [PATCH 5/9] Update docker quick start doc & comment config for CN user Signed-off-by: JaredforReal --- Dockerfile.extproc | 4 +- .../docs/getting-started/docker-quickstart.md | 140 ++++++++++++------ 2 files changed, 99 insertions(+), 45 deletions(-) diff --git a/Dockerfile.extproc b/Dockerfile.extproc index 89e66ada..652e6f89 100644 --- a/Dockerfile.extproc +++ b/Dockerfile.extproc @@ -25,9 +25,9 @@ FROM golang:1.24 as go-builder WORKDIR /app # Use China-friendly Go module mirrors to avoid proxy.golang.org timeouts -ENV GOPROXY=https://goproxy.cn,direct +# ENV GOPROXY=https://goproxy.cn,direct # Prefer a reachable checksum database in CN (or set to 'off' if still blocked) -ENV GOSUMDB=sum.golang.google.cn +# ENV GOSUMDB=sum.golang.google.cn # Copy Go module files first for better layer caching RUN mkdir -p src/semantic-router diff --git a/website/docs/getting-started/docker-quickstart.md b/website/docs/getting-started/docker-quickstart.md index e06bed44..7eae6e59 100644 --- a/website/docs/getting-started/docker-quickstart.md +++ b/website/docs/getting-started/docker-quickstart.md @@ -6,40 +6,40 @@ Run Semantic Router + Envoy locally using Docker Compose v2. - Docker Engine and Docker Compose v2 (use the `docker compose` command, not the legacy `docker-compose`) - ```bash - # Verify - docker compose version - ``` + ```bash + # Verify + docker compose version + ``` - Install Docker Compose v2 for Ubuntu(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) + Install Docker Compose v2 for Ubuntu(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) - ```bash - # Remove legacy v1 if present (optional) - sudo apt-get remove -y docker-compose || true + ```bash + # Remove legacy v1 if present (optional) + sudo apt-get remove -y docker-compose || true - sudo apt-get update - sudo apt-get install -y ca-certificates curl gnupg - sudo install -m 0755 -d /etc/apt/keyrings - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --yes --dearmor -o /etc/apt/keyrings/docker.gpg - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null - sudo apt-get update - sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin + sudo apt-get update + sudo apt-get install -y ca-certificates curl gnupg + sudo install -m 0755 -d /etc/apt/keyrings + curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --yes --dearmor -o /etc/apt/keyrings/docker.gpg + echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + sudo apt-get update + sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin - docker compose version - ``` + docker compose version + ``` - Ensure ports 8801, 50051, 19000 are free ## Install and Run with Docker Compose v2 -1) Clone the repo and move into it (from your workspace root): +1. Clone the repo and move into it (from your workspace root): ```bash git clone https://github.com/vllm-project/semantic-router.git cd semantic-router ``` -2) Download required models (classification models): +2. Download required models (classification models): ```bash make download-models @@ -53,7 +53,7 @@ This downloads the classification models used by the router: Note: The BERT similarity model defaults to a remote Hugging Face model. See Troubleshooting for offline/local usage. -3) Start the services with Docker Compose v2: +3. Start the services with Docker Compose v2: ```bash # Start core services (semantic-router + envoy) @@ -62,11 +62,12 @@ docker compose up --build # Or run in background (recommended) docker compose up --build -d -# With testing profile (includes mock vLLM) -docker compose --profile testing up --build +# With testing profile (includes mock vLLM). Use testing config to point router at the mock endpoint: +# (CONFIG_FILE is read by the router entrypoint; the file is mounted from ./config) +CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build ``` -4) Verify +4. Verify - Semantic Router (gRPC): localhost:50051 - Envoy Proxy: http://localhost:8801 @@ -90,7 +91,7 @@ docker compose down ## Troubleshooting -### 1) Router exits immediately with a Hugging Face DNS/download error +** 1. Router exits immediately with a Hugging Face DNS/download error ** Symptoms (from `docker compose logs -f semantic-router`): @@ -103,32 +104,85 @@ Why: `bert_model.model_id` in `config/config.yaml` points to a remote model (`se Fix options: - Allow network access in the container (online): + - Ensure your host can resolve DNS, or add DNS servers to the `semantic-router` service in `docker-compose.yml`: - ```yaml - services: - semantic-router: - # ... - dns: - - 1.1.1.1 - - 8.8.8.8 - ``` - + ```yaml + services: + semantic-router: + # ... + dns: + - 1.1.1.1 + - 8.8.8.8 + ``` + - If behind a proxy, set `http_proxy/https_proxy/no_proxy` env vars for the service. - Use a local copy of the model (offline): - 1. Download `sentence-transformers/all-MiniLM-L12-v2` to `./models/sentence-transformers/all-MiniLM-L12-v2/` on the host. - 2. Update `config/config.yaml` to use the local path (mounted into the container at `/app/models`): - ```yaml - bert_model: - model_id: "models/sentence-transformers/all-MiniLM-L12-v2" - threshold: 0.6 - use_cpu: true - ``` + 1. Download `sentence-transformers/all-MiniLM-L12-v2` to `./models/sentence-transformers/all-MiniLM-L12-v2/` on the host. + 2. Update `config/config.yaml` to use the local path (mounted into the container at `/app/models`): + + ```yaml + bert_model: + model_id: "models/sentence-transformers/all-MiniLM-L12-v2" + threshold: 0.6 + use_cpu: true + ``` + + 3. Recreate services: `docker compose up -d --build` + +Extra tip: If you use the testing profile, also pass the testing config so the router targets the mock service: + +```bash +CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build +``` + +** 2. Envoy/Router up but requests fail ** + +- Ensure `mock-vllm` is healthy (testing profile only): + - `docker compose ps` should show mock-vllm healthy; logs show 200 on `/health`. +- Verify the router config in use: + - Router logs print `Starting vLLM Semantic Router ExtProc with config: ...`. If it shows `/app/config/config.yaml` while testing, you forgot `CONFIG_FILE`. +- Basic smoke test via Envoy (OpenAI-compatible): + - Send a POST to `http://localhost:8801/v1/chat/completions` with `{"model":"auto", "messages":[{"role":"user","content":"hi"}]}` and check that the mock responds with `[mock-openai/gpt-oss-20b]` content when testing profile is active. + +** 3. DNS problems inside containers ** - 3. Recreate services: `docker compose up -d --build` +If DNS is flaky in your Docker environment, add DNS servers to the `semantic-router` service in `docker-compose.yml`: -### 2) Port already in use +```yaml +services: + semantic-router: + # ... + dns: + - 1.1.1.1 + - 8.8.8.8 +``` + +For corporate proxies, set `http_proxy`, `https_proxy`, and `no_proxy` in the service `environment`. Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in `docker-compose.yml` if needed. + +** 4. China Mainland tips (mirrors and offline caches) ** + +If you're in CN mainland and network access to Go/Hugging Face/PyPI is slow or blocked: + +- Hugging Face models (router downloads BERT embeddings on first run): + + - Prefer using a local copy mounted via `./models` and point `bert_model.model_id` to `models/...`. + - Or mount your HF cache into the container and set cache env var (uncomment in `docker-compose.yml`): + - Volume: `~/.cache/huggingface:/root/.cache/huggingface` + - Env: `HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface` + - Optional mirrors: + - `HF_ENDPOINT=https://hf-mirror.com` + - `HF_HUB_ENABLE_HF_TRANSFER=1` + +- Go modules (used during image build): + + - Already set in Dockerfile to `GOPROXY=https://goproxy.cn,direct` and `GOSUMDB=sum.golang.google.cn` for reliability. + +- PyPI (for mock-vllm image): + - You can configure pip to use a mirror (commented example in `tools/mock-vllm/Dockerfile`): + - `python -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple` + - `python -m pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn` From ca525415d6e7b40d865ff6d5f97009fdd0105555 Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 16:26:09 +0800 Subject: [PATCH 6/9] clean docker-compose.yml Signed-off-by: JaredforReal --- docker-compose.yml | 5 ----- website/docs/getting-started/docker-quickstart.md | 4 ++-- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/docker-compose.yml b/docker-compose.yml index 7f38cab4..afc7e7e1 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -11,14 +11,9 @@ services: volumes: - ./config:/app/config:ro - ./models:/app/models:ro - # - ~/.cache/huggingface:/root/.cache/huggingface # uncomment to persist Hugging Face cache on host (CN users) environment: - LD_LIBRARY_PATH=/app/lib - CONFIG_FILE=${CONFIG_FILE:-/app/config/config.yaml} - # The following environment variables help CN mainland users download Hugging Face models via mirrors - # - HF_HUB_ENABLE_HF_TRANSFER=1 # uncomment to enable fast transfer for HF downloads (CN users) - # - HF_ENDPOINT=https://hf-mirror.com # uncomment to use HF mirror endpoint in China - # - HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface # uncomment to set HF cache directory (works with volume above) networks: - semantic-network healthcheck: diff --git a/website/docs/getting-started/docker-quickstart.md b/website/docs/getting-started/docker-quickstart.md index 7eae6e59..e2c5c771 100644 --- a/website/docs/getting-started/docker-quickstart.md +++ b/website/docs/getting-started/docker-quickstart.md @@ -171,7 +171,7 @@ If you're in CN mainland and network access to Go/Hugging Face/PyPI is slow or b - Hugging Face models (router downloads BERT embeddings on first run): - Prefer using a local copy mounted via `./models` and point `bert_model.model_id` to `models/...`. - - Or mount your HF cache into the container and set cache env var (uncomment in `docker-compose.yml`): + - Or mount your HF cache into the container and set cache env var (in `docker-compose.yml`): - Volume: `~/.cache/huggingface:/root/.cache/huggingface` - Env: `HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface` - Optional mirrors: @@ -180,7 +180,7 @@ If you're in CN mainland and network access to Go/Hugging Face/PyPI is slow or b - Go modules (used during image build): - - Already set in Dockerfile to `GOPROXY=https://goproxy.cn,direct` and `GOSUMDB=sum.golang.google.cn` for reliability. + - Set in `Dockerfile`: `GOPROXY=https://goproxy.cn,direct` and `GOSUMDB=sum.golang.google.cn`. - PyPI (for mock-vllm image): - You can configure pip to use a mirror (commented example in `tools/mock-vllm/Dockerfile`): From 825157302cb102fcd3aa4bbe6f7e6c145545490d Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 21:22:18 +0800 Subject: [PATCH 7/9] modify docker-quickstart Signed-off-by: JaredforReal --- .../docs/getting-started/docker-quickstart.md | 31 +++++++------------ 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/website/docs/getting-started/docker-quickstart.md b/website/docs/getting-started/docker-quickstart.md index e2c5c771..0742b589 100644 --- a/website/docs/getting-started/docker-quickstart.md +++ b/website/docs/getting-started/docker-quickstart.md @@ -4,26 +4,19 @@ Run Semantic Router + Envoy locally using Docker Compose v2. ## Prerequisites -- Docker Engine and Docker Compose v2 (use the `docker compose` command, not the legacy `docker-compose`) +- Docker Engine, see more in [Docker Engine Installation](https://docs.docker.com/engine/install/) +- Docker Compose v2 (use the `docker compose` command, not the legacy `docker-compose`) ```bash # Verify docker compose version ``` - Install Docker Compose v2 for Ubuntu(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) + Docker Compose Installation for Ubuntu(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) ```bash - # Remove legacy v1 if present (optional) - sudo apt-get remove -y docker-compose || true - - sudo apt-get update - sudo apt-get install -y ca-certificates curl gnupg - sudo install -m 0755 -d /etc/apt/keyrings - curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --yes --dearmor -o /etc/apt/keyrings/docker.gpg - echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update - sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin + sudo apt-get install -y docker-compose-plugin docker compose version ``` @@ -32,14 +25,14 @@ Run Semantic Router + Envoy locally using Docker Compose v2. ## Install and Run with Docker Compose v2 -1. Clone the repo and move into it (from your workspace root): +**1. Clone the repo and move into it (from your workspace root)** ```bash git clone https://github.com/vllm-project/semantic-router.git cd semantic-router ``` -2. Download required models (classification models): +**2. Download required models (classification models)** ```bash make download-models @@ -53,7 +46,7 @@ This downloads the classification models used by the router: Note: The BERT similarity model defaults to a remote Hugging Face model. See Troubleshooting for offline/local usage. -3. Start the services with Docker Compose v2: +**3. Start the services with Docker Compose v2** ```bash # Start core services (semantic-router + envoy) @@ -67,7 +60,7 @@ docker compose up --build -d CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build ``` -4. Verify +**4. Verify** - Semantic Router (gRPC): localhost:50051 - Envoy Proxy: http://localhost:8801 @@ -91,7 +84,7 @@ docker compose down ## Troubleshooting -** 1. Router exits immediately with a Hugging Face DNS/download error ** +**1. Router exits immediately with a Hugging Face DNS/download error** Symptoms (from `docker compose logs -f semantic-router`): @@ -138,7 +131,7 @@ Extra tip: If you use the testing profile, also pass the testing config so the r CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build ``` -** 2. Envoy/Router up but requests fail ** +**2. Envoy/Router up but requests fail** - Ensure `mock-vllm` is healthy (testing profile only): - `docker compose ps` should show mock-vllm healthy; logs show 200 on `/health`. @@ -147,7 +140,7 @@ CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up - Basic smoke test via Envoy (OpenAI-compatible): - Send a POST to `http://localhost:8801/v1/chat/completions` with `{"model":"auto", "messages":[{"role":"user","content":"hi"}]}` and check that the mock responds with `[mock-openai/gpt-oss-20b]` content when testing profile is active. -** 3. DNS problems inside containers ** +**3. DNS problems inside containers** If DNS is flaky in your Docker environment, add DNS servers to the `semantic-router` service in `docker-compose.yml`: @@ -164,7 +157,7 @@ For corporate proxies, set `http_proxy`, `https_proxy`, and `no_proxy` in the se Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in `docker-compose.yml` if needed. -** 4. China Mainland tips (mirrors and offline caches) ** +**4. China Mainland tips (mirrors and offline caches)** If you're in CN mainland and network access to Go/Hugging Face/PyPI is slow or blocked: From 6b34904c9c6e941cce10c7bcebf01311a846e807 Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Mon, 22 Sep 2025 22:16:32 +0800 Subject: [PATCH 8/9] installation for more distribution Signed-off-by: JaredforReal --- website/docs/getting-started/docker-quickstart.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/website/docs/getting-started/docker-quickstart.md b/website/docs/getting-started/docker-quickstart.md index 0742b589..5ddf5144 100644 --- a/website/docs/getting-started/docker-quickstart.md +++ b/website/docs/getting-started/docker-quickstart.md @@ -7,17 +7,18 @@ Run Semantic Router + Envoy locally using Docker Compose v2. - Docker Engine, see more in [Docker Engine Installation](https://docs.docker.com/engine/install/) - Docker Compose v2 (use the `docker compose` command, not the legacy `docker-compose`) - ```bash - # Verify - docker compose version - ``` - - Docker Compose Installation for Ubuntu(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) + Docker Compose Plugin Installation(if missing), see more in [Docker Compose Plugin Installation](https://docs.docker.com/compose/install/linux/#install-using-the-repository) ```bash + # For Ubuntu and Debian, run: sudo apt-get update sudo apt-get install -y docker-compose-plugin + # For RPM-based distributions, run: + sudo yum update + sudo yum install docker-compose-plugin + + # Verify docker compose version ``` From e50e441be64b64fb9cdd8902178e06b1c9f0f188 Mon Sep 17 00:00:00 2001 From: JaredforReal Date: Tue, 23 Sep 2025 12:15:09 +0800 Subject: [PATCH 9/9] get rid of optimization for CN network Signed-off-by: JaredforReal --- Dockerfile.extproc | 9 -------- tools/mock-vllm/Dockerfile | 10 +++----- .../docs/getting-started/docker-quickstart.md | 23 ------------------- 3 files changed, 3 insertions(+), 39 deletions(-) diff --git a/Dockerfile.extproc b/Dockerfile.extproc index 652e6f89..72ead6e4 100644 --- a/Dockerfile.extproc +++ b/Dockerfile.extproc @@ -24,20 +24,11 @@ FROM golang:1.24 as go-builder WORKDIR /app -# Use China-friendly Go module mirrors to avoid proxy.golang.org timeouts -# ENV GOPROXY=https://goproxy.cn,direct -# Prefer a reachable checksum database in CN (or set to 'off' if still blocked) -# ENV GOSUMDB=sum.golang.google.cn - # Copy Go module files first for better layer caching RUN mkdir -p src/semantic-router COPY src/semantic-router/go.mod src/semantic-router/go.sum src/semantic-router/ COPY candle-binding/go.mod candle-binding/semantic-router.go candle-binding/ -# Pre-download Go modules to leverage Docker layer caching and fail fast if mirrors are unreachable -RUN cd src/semantic-router && go mod download && \ - cd /app/candle-binding && go mod download - # Copy semantic-router source code COPY src/semantic-router/ src/semantic-router/ diff --git a/tools/mock-vllm/Dockerfile b/tools/mock-vllm/Dockerfile index 3a7e812c..ea955b2b 100644 --- a/tools/mock-vllm/Dockerfile +++ b/tools/mock-vllm/Dockerfile @@ -6,14 +6,10 @@ RUN apt-get update && apt-get install -y --no-install-recommends \ curl \ && rm -rf /var/lib/apt/lists/* -# Uncomment to Configure pip to use a China mirror for faster installs -# RUN python -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ -# python -m pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn +COPY requirements.txt +RUN pip install --no-cache-dir -r requirements.txt -COPY requirements.txt /app/requirements.txt -RUN pip install --no-cache-dir -r /app/requirements.txt - -COPY app.py /app/app.py +COPY app.py EXPOSE 8000 diff --git a/website/docs/getting-started/docker-quickstart.md b/website/docs/getting-started/docker-quickstart.md index 5ddf5144..6a517ff2 100644 --- a/website/docs/getting-started/docker-quickstart.md +++ b/website/docs/getting-started/docker-quickstart.md @@ -157,26 +157,3 @@ services: For corporate proxies, set `http_proxy`, `https_proxy`, and `no_proxy` in the service `environment`. Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in `docker-compose.yml` if needed. - -**4. China Mainland tips (mirrors and offline caches)** - -If you're in CN mainland and network access to Go/Hugging Face/PyPI is slow or blocked: - -- Hugging Face models (router downloads BERT embeddings on first run): - - - Prefer using a local copy mounted via `./models` and point `bert_model.model_id` to `models/...`. - - Or mount your HF cache into the container and set cache env var (in `docker-compose.yml`): - - Volume: `~/.cache/huggingface:/root/.cache/huggingface` - - Env: `HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface` - - Optional mirrors: - - `HF_ENDPOINT=https://hf-mirror.com` - - `HF_HUB_ENABLE_HF_TRANSFER=1` - -- Go modules (used during image build): - - - Set in `Dockerfile`: `GOPROXY=https://goproxy.cn,direct` and `GOSUMDB=sum.golang.google.cn`. - -- PyPI (for mock-vllm image): - - You can configure pip to use a mirror (commented example in `tools/mock-vllm/Dockerfile`): - - `python -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple` - - `python -m pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn`