End-to-end pipeline: File Share -> Crawler -> Text Extractor -> Embeddings (bge-m3) -> Qdrant -> RAG API -> llama.cpp (Qwen 7B/14B)
- Ingestion worker (Python): crawls SMB/FS, extracts text with
unstructured, chunks, embeds viaBAAI/bge-m3, upserts into Qdrant. - Vector DB: Qdrant (Docker).
- RAG API (FastAPI): retrieves from Qdrant and calls llama.cpp (
/completion) on 7B (8080) or 14B (8081). - Systemd services:
qwen3-7b,qwen3-14b,rag-app, plus nightlyrag-ingest.timer.
See config/settings.yaml for paths and knobs.
EOF