FastAPI backend for the mobile client: JSON and multipart endpoints for chat, translation, vision, speech (STT/TTS/STS), PDF summarization, and text extraction. LLM work is delegated to an OpenAI-compatible upstream (vLLM or LiteLLM) using the official Python openai client.
+---------------------------+
| Clients |
| (Android / services) |
+-------------+-----------+
|
X-API-Key | HTTP(S)
(header) |
v
+--------------------------------------------------------------------------------+
| nginx |
| - reverse proxy to the API container |
| - client_max_body_size (large uploads) |
| - elevated proxy read/send timeouts (LLM + uploads) |
| - optional: TLS termination in front |
+----------------------------------------+--------------------------------------+
|
v
+--------------------------------------------------------------------------------+
| FastAPI (uvicorn :8000) |
| +---------------------------------------------------------------------------+ |
| | Global: /healthz (no API key) | /docs, /openapi.json (no API key) | |
| +---------------------------------------------------------------------------+ |
| | Middleware: ApiKeyMiddleware -> X-API-Key must match API_KEYS | |
| +---------------------------------------------------------------------------+ |
| | Routers: /v1/transcribe, /v1/indic_chat, /v1/audio/speech, ... | |
| +------------------+----------------+----------------+------------------------+ |
| | | | |
| v v v | |
| +---------------+ +-------------+ +---------------+ | |
| | STT / TTS | | VllmClient | | PDF + OCR | | |
| | adapters | | (AsyncOpenAI| | adapters | | |
| | (stub or | | chat API) | | (pypdf, stub, | | |
| | vllm_gemma) | | | | pdf_text) | | |
| +-------+-------+ +------+------+ +--------+------+ | |
| | | | |
+--------------------------------------------------------------------------------+
| | |
| | |
| v |
| OpenAI-compatible |
| HTTP (JSON multimodal) |
| | |
| v |
| +-------------------+ |
| | vLLM or LiteLLM | |
| | (Gemma / other) | |
| +-------------------+ |
| |
+---------------------------------+
Data flow summary
- Authenticated API traffic enters nginx, is forwarded to FastAPI, and is checked against
API_KEYSviaX-API-Key. - Chat, translate, vision, PDF summarize (after text extraction) call
VllmClient, which usesPOST /chat/completionson the configured base URL (VLLM_BASE_URLorLITELLM_API_BASE, must include/v1). - Transcribe / speech-to-speech (STT leg) can use stubs or
STT_IMPLEMENTATION=vllm_gemma(Gemma ASR prompts +input_audioper the vLLM Gemma 4 recipe). - TTS returns binary audio from a stub or future engine adapter.
- extract-text uses OCR adapters (stub or PDF text extraction).
app/
main.py # App factory, middleware, /healthz
config.py # pydantic-settings
clients/vllm.py # OpenAI-compatible client + transcribe/vision/chat helpers
adapters/ # stt, tts, ocr, pdf
prompts/gemma_e2b.py # Gemma 4 E2B ASR prompt text
routers/v1/ # One module per route group
schemas/v1.py # Request/response models
docker/
Dockerfile
nginx.conf
docker-compose.yml
tests/
1. Configure environment
cp .env.example .env
# Set API_KEYS, VLLM_BASE_URL (or LITELLM_API_BASE), and model defaults.2. Run locally
uv sync --all-extras
uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 80003. Run with Docker Compose (API + nginx)
docker compose --env-file .env up --buildNginx listens on HTTP_PORT (default 8080) and proxies to the API service.
See .env.example for all variables. Important ones:
| Variable | Role |
|---|---|
API_KEYS |
Comma-separated keys; must match X-API-Key |
VLLM_BASE_URL or LITELLM_API_BASE |
OpenAI-compatible base URL ending in /v1 |
DEFAULT_*_MODEL |
Fallback model IDs when the client omits them |
STT_IMPLEMENTATION |
stub or vllm_gemma |
MAX_UPLOAD_MB |
Upload size guard |
| Method | Path | Notes |
|---|---|---|
POST |
/v1/transcribe/ |
multipart audio + language |
POST |
/v1/indic_chat |
JSON chat |
POST |
/v1/audio/speech |
TTS; query input, language |
POST |
/v1/translate |
JSON |
POST |
/v1/indic_visual_query |
multipart file + query + langs |
POST |
/v1/speech_to_speech |
multipart audio |
POST |
/v1/extract-text |
multipart + page_number, language |
POST |
/v1/indic-summarize-pdf-all |
multipart PDF + tgt_lang, model parts |
OpenAPI: /docs or /openapi.json (unauthenticated in the default middleware).
uv run pytest- vLLM OpenAI-compatible server
- Gemma 4 — audio transcription (OpenAI SDK)
- google/gemma-4-E2B-it (prompt wording for ASR)