This tool empowers InfoSec professionals by processing a corpus of user-uploaded documents (e.g., PDFs, DOCX, Markdown, TXT) to deliver accurate, contextually relevant answers to security-related queries using a Retrieval-Augmented Generation (RAG) pipeline. Designed for on-premises deployment, it utilizes local resources like Weaviate for vector storage and integrates with OpenAI-compatible API providers (e.g., Ollama via OpenWebUI) for embeddings and language model inference, ensuring data privacy and control.
The project is actively developed with a functional API, ingestion, retrieval, and generation pipeline. This README guides you through setup, deployment, and usage based on the current codebase, aligning with the vision outlined in the Architecture Document.
- Document Ingestion: Upload and process documents into searchable chunks stored in Weaviate.
- Hybrid Retrieval: Retrieve relevant document snippets using Weaviate's hybrid search (semantic + keyword).
- Answer Generation: Generate concise, cited responses using a configurable LLM.
- API Access: Interact via FastAPI endpoints for uploading, listing, and querying documents.
- Streamlit UI: A user-friendly web interface for document upload, listing, and querying.
- Local Deployment: Runs entirely on-premises with Dockerized Weaviate and external LLM/embedding providers.
Ensure the following are installed before proceeding:
- Python: Version 3.10 or higher.
- uv: A fast Python package installer and virtual environment manager. Installation Guide
- Docker & Docker Compose: For running Weaviate locally. Docker Installation
- Embedding and LLM Providers: An OpenAI-compatible API service (e.g., Ollama via OpenWebUI) running locally or on your network:
- Embedding Model: Requires
mxbai-embed-largeor equivalent (configurable). - LLM Model: Requires a generation model like
gemma3or similar (configurable). - Example: Ollama at
http://localhost:3000with OpenWebUI for API access.
- Embedding Model: Requires
Follow these steps to set up and deploy the tool locally:
git clone <repository-url>
cd security-reviewer-agentCreate and activate a virtual environment using uv:
uv venv create
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # WindowsThis project uses pyproject.toml to define its core dependencies. To install these dependencies into your virtual environment and make your project editable (recommended for development):
uv pip install -e .- This will also install Streamlit, which is used for the web UI.
- For production-like environments or to ensure exact reproducibility from a lock file, you can use:
uv pip sync requirements.txt
Launch the Weaviate vector database using Docker Compose:
docker-compose up -d- Weaviate will be accessible at
http://localhost:8080(HTTP) and50051(gRPC). - Verify it's running:
curl http://localhost:8080/v1/.well-known/ready.
Create a .env file in the project root with the following:
# Weaviate Configuration
WEAVIATE_URL=http://localhost:8080
WEAVIATE_GRPC_PORT=50051
# Embedding Provider Configuration
EMBEDDING_API_BASE_URL=http://localhost:3000
EMBEDDING_API_KEY=<your_embedding_api_key>
DEFAULT_EMBEDDING_MODEL=mxbai-embed-large
# LLM Provider Configuration
LLM_API_BASE_URL=http://localhost:3000
LLM_API_KEY=<your_llm_api_key>
DEFAULT_GENERATION_MODEL=gemma3
# Reranker Configuration
RERANKER_ENABLED=True
RERANKER_TYPE=cross_encoder
RERANKER_MODEL_NAME=cross-encoder/ms-marco-MiniLM-L-6-v2
RERANKER_INITIAL_TOP_K=10
RERANKED_TOP_K=5
RETRIEVAL_ALPHA=0.5- Replace
<your_embedding_api_key>and<your_llm_api_key>with API keys from your provider (e.g., OpenWebUI). - Adjust URLs if your services run on different hosts/ports.
- These settings assume a local Ollama instance managed via OpenWebUI at
http://localhost:3000. - The reranker configuration controls the cross-encoder reranking process, which improves retrieval quality by reordering document chunks based on relevance to the query.
Start the FastAPI server:
uv run uvicorn src.main:app --reload- The server runs at
http://localhost:8000. - Access interactive API docs at
http://localhost:8000/docs.
If you want to use the web interface, ensure the API server (step 6) is running, then start the Streamlit app:
streamlit run streamlit_app.py- The Streamlit app will typically be available at
http://localhost:8501.
Interact with the tool via API endpoints or the Streamlit UI to upload documents, list them, and query the RAG pipeline.
After starting the Streamlit application (see "Getting Started" step 7), you can access the web UI in your browser (usually at http://localhost:8501). The UI provides an intuitive way to:
- Upload new documents.
- View a list of already ingested documents and their processing status.
- Ask questions and receive answers based on the content of the uploaded documents.
- Select specific documents to scope your queries.
The Streamlit UI communicates with the FastAPI backend to perform these operations.
Here are some example curl commands to interact with the API:
1. Upload a Text Document:
curl -X POST -F "file=@/path/to/your_notes.txt" http://localhost:8000/v1/upload- Replace
/path/to/your_notes.txtwith the actual path to your.txtfile. - Ensure the file has the
.txtextension.
2. Upload a Markdown Document (with spaces in filename):
curl -X POST -F "file=@/path/to/security policy notes.md" http://localhost:8000/v1/upload- Replace
/path/to/security policy notes.mdwith the actual path to your.mdfile. - The double quotes around the
-Fargument value handle spaces or special characters in the file path. - Important: Do not add extra single quotes inside the double quotes (e.g.,
file=@'/path/...').
3. List All Ingested Documents:
curl http://localhost:8000/v1/documents4. Query the Pipeline (Basic):
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "What is the incident response plan?"}' \
http://localhost:8000/v1/query- Replace the query text with your question.
5. Query the Pipeline (Scoped to Specific Documents):
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "Summarize the key findings for doc-abc123456789", "document_ids": ["doc-abc123456789"]}' \
http://localhost:8000/v1/query- Replace
doc-abc123456789with the actual document ID(s) you want to query against.
6. Query the Pipeline (Specifying a Model):
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "Compare policy A and policy B", "model_name": "another-model"}' \
http://localhost:8000/v1/query- Replace
another-modelwith the specific generation model name you want to use (if different from the default configured in.env).
7. Query the Pipeline (Using a Custom System Prompt):
curl -X POST -H "Content-Type: application/json" \
-d '{
"query": "What are the key points about our data retention policy?",
"prompt": "You are a helpful AI assistant. Based on the provided context: {context}, please answer the user\'s query concisely."
}' \
http://localhost:8000/v1/query- The
promptfield allows you to override the default system prompt. - It's recommended to include
{context}in your custom prompt so the retrieved document chunks can be injected.
Upload a document for ingestion:
curl -X POST -F "file=@path/to/document.txt" http://localhost:8000/upload- Supported Formats:
.md,.txt. - Response:
{ "document_id": "doc-abc123456789", "filename": "document.txt", "status": "processing", "message": "Document received and queued for ingestion." } - The document is processed in the background, chunked, embedded, and stored in Weaviate.
Retrieve metadata for all ingested documents:
curl http://localhost:8000/documents- Response:
{ "documents": [ { "document_id": "doc-abc123456789", "filename": "document.txt", "metadata": {"upload_time": "2023-11-01T12:00:00Z", "status": "completed"} } ], "total": 1 }
Submit a query to get an answer with sources:
curl -X POST -H "Content-Type: application/json" \
-d '{"query": "What is the security policy?"}' \
http://localhost:8000/query- Response:
{ "answer": "The security policy mandates strong passwords and encryption...", "sources": [ { "document_id": "doc-abc123456789", "snippet": "All systems must enforce strong passwords...", "metadata": {"file_name": "document.txt", "chunk_index": 2} } ] } - Optionally, scope the query with
document_ids:{ "query": "What is the security policy?", "document_ids": ["doc-abc123456789"] } - Optionally, provide a custom system
promptto guide the LLM's response style and content. If using a customprompt, it's highly recommended to include a{context}placeholder where the retrieved document chunks will be injected. For example:{ "query": "What is the security policy?", "prompt": "You are an expert legal advisor. Based on the provided context: {context}, summarize the key legal implications." }
For detailed endpoint specs, see API Documentation. For component-level usage (e.g., ingestion, retrieval), refer to the Implementation Guides.
This project uses uv for fast Python package management and adheres to the following dependency workflow:
pyproject.toml: This file is the authoritative source of truth for the project's direct dependencies and their version constraints. All new dependencies should be added here first.requirements.txt: This file acts as a lock file. It is generated frompyproject.tomland contains the exact versions of all direct and transitive dependencies. This ensures reproducible builds across different environments.
-
Adding a New Dependency:
- Manually add the new package and its version specifier (e.g.,
new_package ~= "1.2.0") to the[project.dependencies]section ofpyproject.toml. - Install the updated dependencies into your editable environment:
uv pip install -e . - After confirming the new dependency works as expected, update the lock file:
uv pip freeze > requirements.txt - Commit both
pyproject.tomland the updatedrequirements.txtto version control.
- Manually add the new package and its version specifier (e.g.,
-
Setting Up in a New Environment:
- For Development (Editable Install): Use
pyproject.tomlto set up an editable environment. This is generally preferred for development as it reflects changes in your local project code immediately.uv venv create # Or your preferred venv creation method source .venv/bin/activate uv pip install -e .
- For Reproducible/Production-like Builds: Use the
requirements.txtlock file to ensure you get the exact same versions of all dependencies.uv venv create source .venv/bin/activate uv pip sync requirements.txt
- For Development (Editable Install): Use
-
Updating Existing Dependencies:
- To update a specific package, modify its version constraint in
pyproject.toml. - Then, reinstall and update the lock file:
uv pip install -e . uv pip freeze > requirements.txt
- To update all dependencies to their latest allowed versions based on
pyproject.tomlconstraints:# This command attempts to update packages while respecting pyproject.toml constraints # Note: `uv pip install -e . --upgrade` might be needed depending on specific scenarios # or manually resolve conflicts if they arise. uv pip install -e . --upgrade # Or carefully manage updates individually uv pip freeze > requirements.txt
- To update a specific package, modify its version constraint in
By following this workflow, we maintain a clear definition of direct dependencies while ensuring consistent and reproducible environments through the lock file.
Run the test suite to verify functionality:
uv run pytestGenerate a coverage report:
- Install test dependencies:
uv pip install -e '.[test]' - Run tests with coverage:
uv run pytest --cov=src --cov-report=html
- Open
htmlcov/index.htmlin a browser to view the report.
The current codebase includes:
src/: Core application code.api/: FastAPI routes (routes.py), schemas (schemas.py), and utilities.generation/: LLM interaction (ollama_llm.py), prompt formatting (prompt_formatter.py).ingestion/: Document processing (ingestion_pipeline.py,loader.py,splitter.py), embedding (embedder.py), and vector storage (vector_store.py).models/: Data models (document.py).pipeline/: RAG pipeline orchestration (rag_pipeline.py).retrieval/: Hybrid search logic (hybrid_search_retriever.py) and reranker implementation (reranker.py,reranker_interface.py).utils/: Shared utilities (e.g.,file_utils.py,log_config.py).main.py: FastAPI app entry point.config.py: Configuration loader.dependencies.py: Dependency injection for services and components.
streamlit_app.py: The Streamlit web user interface application.tests/: Comprehensive unit and integration tests.docs/: Architecture and implementation guides.pyproject.toml: Defines project metadata and core dependencies (the source of truth).requirements.txt: A lock file generated frompyproject.tomlfor reproducible environments..env: Environment variables.docker-compose.yml: Weaviate service definition.
- Ingestion: Documents are uploaded via
/upload, chunked (512 tokens, 100 overlap), embedded usingmxbai-embed-large, and stored in Weaviate. - Retrieval: Queries hit
/query, triggering hybrid search in Weaviate to fetch relevant chunks. - Reranking: Retrieved chunks are scored and reordered using a cross-encoder model (when enabled) to improve relevance. By default, uses
cross-encoder/ms-marco-MiniLM-L-6-v2to rerank the top initial results. - Generation: The most relevant chunks (after reranking if enabled) are formatted into a prompt, and the LLM (e.g.,
gemma3) generates a response with citations. - API: FastAPI serves all interactions, with background tasks for ingestion and error handling.
See the Architecture Document for a high-level design overview and future goals.
- Weaviate Not Starting: Check Docker logs (
docker-compose logs weaviate) and ensure port8080is free. - API Errors: Verify
.envsettings match your provider's URLs and keys. Check logs inlogs/app.log. - Model Issues: Ensure your LLM/embedding provider is running and supports the configured models.
- Reranker Errors: If you encounter issues with the reranker, try setting
RERANKER_ENABLED=Falseto bypass it. First-time usage might be slow as the model needs to be downloaded. - Performance Concerns: The sentence-transformers library runs locally on the same machine as the application. First-time queries will have higher latency as models are loaded, but subsequent queries benefit from caching. Reranker models typically require 300MB-1GB of memory.
- Evaluation: Implement PromptFoo-based evaluation (Guide 9) in
tests/evaluations/. - Backlog: Implement the backlog items from the Backlog Document.
- Security: Add API key authentication, JWT tokens, and rate limiting.
- Enhancements: Add reranking, graph-based retrieval, or agent workflows as outlined in the architecture.
- Optimization: Tune retrieval (
alpha,top_k) and generation parameters viaconfig.py.