Production-style asynchronous document ingestion and retrieval system built with FastAPI, RabbitMQ, and Redis.
The system crawls documentation, processes ingestion jobs via background workers, and serves citation-backed answers through a responsive API and UI.
Key idea: The API never blocks on ingestion. All crawl and indexing work is offloaded to RabbitMQ workers, keeping the query path fast and reliable.
This project focuses on the engineering concerns that matter in production-oriented retrieval systems:
- Asynchronous ingestion with RabbitMQ and a dedicated worker
- Job-state tracking in Redis with polling via
GET /jobs/{job_id} - Retrieval guardrails and citation-aware answers
- Provider switching for both LLMs and embeddings (
ollamaorgemini) - Containerized local stack with API, worker, UI, Redis, and RabbitMQ
- Unit and integration tests around API, ingestion, queues, and worker behavior
Designed for production-style workflows, the system prioritizes responsiveness on the query path while handling ingestion and indexing asynchronously in the background.
+----------------------+
| Streamlit Frontend |
| app/main.py |
+----------+-----------+
|
v
+-------------+ +--------+---------+ +-------------------+
| API Clients +------> FastAPI Service +-------> Pinecone Vector DB |
| curl / apps | | api/main.py | +-------------------+
+-------------+ +---+----------+---+ ^
| | |
| +---- /ask, /search --+
|
+---- /ingest, /crawl
|
v
+--------+--------+
| RabbitMQ |
| durable queue |
+--------+--------+
|
v
+--------+--------+
| background worker|
| backend/worker.py|
+--------+--------+
|
v
+--------+--------+
| Redis job store |
+-----------------+
- Crawl external documentation with Tavily and filter off-domain pages during ingestion
- Split content into chunks and upsert embeddings into Pinecone in batches
- Answer questions with retrieved context and attach citations
- Accept ingest and crawl requests immediately with
202 Accepted - Expose job status for async workflows
- Run fully via Docker Compose for local demos
- API: FastAPI + Uvicorn
- UI: Streamlit
- Orchestration: LangChain
- Vector store: Pinecone
- Queue: RabbitMQ
- Job state: Redis
- Model providers: Ollama or Google Gemini
- Crawl provider: Tavily
- Testing: Pytest
- Packaging and task runner:
uv
documentation-helper/
api/
main.py # FastAPI endpoints for health, jobs, ingest, crawl, search, ask
schemas.py # Request/response contracts
app/
main.py # Streamlit UI for ingest, crawl, and chat
backend/
core.py # Retrieval and answer generation
ingestion.py # Crawl, clean, chunk, embed, and index
jobs.py # Redis-backed job status storage
rabbitmq.py # Queue topology and publish helpers
selection.py # Retrieval selection and retry heuristics
worker.py # Background job consumer
tests/
test_*.py # Unit and integration coverage
docker-compose.yml # Local multi-service stack
Dockerfile
pyproject.toml
README.md
The recommended path for reviewers is Docker Compose because it brings up the API, worker, Redis, RabbitMQ, and Streamlit together.
PowerShell:
Copy-Item .env.example .envBash:
cp .env.example .envThen fill in the provider keys and infrastructure values you want to use.
uv sync --dev
docker compose up --build
Services:
- Streamlit UI:
http://localhost:8501 - FastAPI docs:
http://localhost:8000/docs - RabbitMQ management UI:
http://localhost:15672
Use the UI or hit the API directly.
Health check:
curl http://localhost:8000/health
Bash: queue an ingest job
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{
"doc_id": "sample-doc",
"text": "LangChain helps build LLM applications.",
"metadata": {"source": "manual"}
}'PowerShell: queue an ingest job
$body = @{
doc_id = "sample-doc"
text = "LangChain helps build LLM applications."
metadata = @{
source = "manual"
}
} | ConvertTo-Json -Depth 3
Invoke-RestMethod `
-Method Post `
-Uri "http://localhost:8000/ingest" `
-ContentType "application/json" `
-Body $bodyBash: queue a crawl job
curl -X POST http://localhost:8000/crawl \
-H "Content-Type: application/json" \
-d '{
"url": "https://python.langchain.com/",
"max_depth": 3,
"extract_depth": "advanced"
}'PowerShell: queue a crawl job
$body = @{
url = "https://python.langchain.com/"
max_depth = 3
extract_depth = "advanced"
} | ConvertTo-Json
Invoke-RestMethod `
-Method Post `
-Uri "http://localhost:8000/crawl" `
-ContentType "application/json" `
-Body $bodyBash: poll job status
curl http://localhost:8000/jobs/<job_id>PowerShell: poll job status
Invoke-RestMethod -Method Get -Uri "http://localhost:8000/jobs/<job_id>"Bash: ask a question
curl -X POST http://localhost:8000/ask \
-H "Content-Type: application/json" \
-d '{
"query": "What is LangChain?",
"top_k": 6
}'PowerShell: ask a question
$body = @{
query = "What is LangChain?"
top_k = 6
} | ConvertTo-Json
Invoke-RestMethod `
-Method Post `
-Uri "http://localhost:8000/ask" `
-ContentType "application/json" `
-Body $bodyIf you want to run services manually, start the dependencies first.
uv run uvicorn api.main:app --host 0.0.0.0 --port 8000
uv run python -m backend.worker
uv run streamlit run app/main.py
Notes:
POST /ingestandPOST /crawlrequire RabbitMQ, Redis, and the worker to be runningPOST /askandPOST /searchrequire a valid Pinecone index plus configured model and embedding providers- Local development defaults to
OLLAMA_BASE_URL=http://localhost:11434 - Docker Compose overrides Ollama base URL to
http://host.docker.internal:11434for container-to-host access
GET /health: readiness checkGET /jobs/{job_id}: fetch current async job statePOST /ingest: enqueue raw text ingestionPOST /crawl: enqueue remote documentation crawl + ingestionPOST /search: semantic search over indexed chunksPOST /ask: retrieve relevant chunks and generate an answer with citations
The repo now includes a real .env.example. The most important settings are:
LLM_PROVIDER:ollamaorgeminiEMBEDDINGS_PROVIDER:ollamaorgeminiOLLAMA_BASE_URLOLLAMA_MODELOLLAMA_EMBED_MODELGEMINI_API_KEYGEMINI_MODELEMBEDDING_MODEL
INDEX_NAMECHUNK_SIZECHUNK_OVERLAPBATCH_SIZEMIN_SOURCESMIN_TOP_SCORENOTE_SCORE_THRESHOLDNOTE_SCORE_MARGIN
TAVILY_API_KEYRABBITMQ_HOSTRABBITMQ_PORTRABBITMQ_USERRABBITMQ_PASSWORDRABBITMQ_VHOSTRABBITMQ_QUEUE_INGESTRABBITMQ_EXCHANGE_DEAD_LETTERRABBITMQ_QUEUE_INGEST_FAILEDWORKER_JOB_MAX_ATTEMPTSWORKER_RABBITMQ_CONNECT_RETRIESWORKER_RABBITMQ_RETRY_DELAY_SECONDS
REDIS_HOSTREDIS_PORTREDIS_DBREDIS_JOB_KEY_PREFIXAPI_BASE_URL
LANGSMITH_TRACINGLANGSMITH_ENDPOINTLANGSMITH_PROJECTLANGSMITH_API_KEY
Run the fast local test path:
uv run python -m pytest --fast
Run the default test suite:
uv run python -m pytest
Run the default suite plus live integration tests against a running local stack:
uv run python -m pytest --live-integration
The integration tests expect:
- API reachable at
http://localhost:8000 - RabbitMQ management API reachable at
http://localhost:15672/api - Valid RabbitMQ credentials in the environment
- Async ingestion keeps expensive crawl and indexing work off the request path
- Durable RabbitMQ messages and dead-letter routing reduce silent job loss
- Redis-backed job state gives clients a stable polling model
- Retrieval guardrails return a fallback when evidence quality is too weak
- Request IDs in the API logs help trace behavior across requests
- Provider abstraction makes it easy to switch between local and hosted models
- There is no authentication or tenant isolation yet
- Redis job records do not currently have expiration or retention policies
- The system relies on external providers for crawl, embeddings, vector storage, and optionally generation
- There is not yet a deployment manifest for a cloud target such as ECS or Kubernetes
- The UI is intentionally minimal and optimized for demonstration rather than product UX depth
- Add auth and per-user document namespaces
- Add metrics, dashboards, and structured log shipping
- Add CI and deployment automation
- Add reranking and evaluation datasets for answer quality tracking
- Add document delete and reindex workflows through the API
This project is licensed under the terms of the LICENSE file.
