Skip to content

pranitl/rag-api

Repository files navigation

Security Reviewer Agent

This tool empowers InfoSec professionals by processing a corpus of user-uploaded documents (e.g., PDFs, DOCX, Markdown, TXT) to deliver accurate, contextually relevant answers to security-related queries using a Retrieval-Augmented Generation (RAG) pipeline. Designed for on-premises deployment, it utilizes local resources like Weaviate for vector storage and integrates with OpenAI-compatible API providers (e.g., Ollama via OpenWebUI) for embeddings and language model inference, ensuring data privacy and control.

The project is actively developed with a functional API, ingestion, retrieval, and generation pipeline. This README guides you through setup, deployment, and usage based on the current codebase, aligning with the vision outlined in the Architecture Document.

Features

  • Document Ingestion: Upload and process documents into searchable chunks stored in Weaviate.
  • Hybrid Retrieval: Retrieve relevant document snippets using Weaviate's hybrid search (semantic + keyword).
  • Answer Generation: Generate concise, cited responses using a configurable LLM.
  • API Access: Interact via FastAPI endpoints for uploading, listing, and querying documents.
  • Streamlit UI: A user-friendly web interface for document upload, listing, and querying.
  • Local Deployment: Runs entirely on-premises with Dockerized Weaviate and external LLM/embedding providers.

Prerequisites

Ensure the following are installed before proceeding:

  1. Python: Version 3.10 or higher.
  2. uv: A fast Python package installer and virtual environment manager. Installation Guide
  3. Docker & Docker Compose: For running Weaviate locally. Docker Installation
  4. Embedding and LLM Providers: An OpenAI-compatible API service (e.g., Ollama via OpenWebUI) running locally or on your network:
    • Embedding Model: Requires mxbai-embed-large or equivalent (configurable).
    • LLM Model: Requires a generation model like gemma3 or similar (configurable).
    • Example: Ollama at http://localhost:3000 with OpenWebUI for API access.

Getting Started

Follow these steps to set up and deploy the tool locally:

1. Clone the Repository

git clone <repository-url>
cd security-reviewer-agent

2. Set Up Python Environment

Create and activate a virtual environment using uv:

uv venv create
source .venv/bin/activate  # macOS/Linux
# .venv\Scripts\activate   # Windows

3. Install Dependencies

This project uses pyproject.toml to define its core dependencies. To install these dependencies into your virtual environment and make your project editable (recommended for development):

uv pip install -e .
  • This will also install Streamlit, which is used for the web UI.
  • For production-like environments or to ensure exact reproducibility from a lock file, you can use:
    uv pip sync requirements.txt

4. Start Weaviate

Launch the Weaviate vector database using Docker Compose:

docker-compose up -d
  • Weaviate will be accessible at http://localhost:8080 (HTTP) and 50051 (gRPC).
  • Verify it's running: curl http://localhost:8080/v1/.well-known/ready.

5. Configure Environment Variables

Create a .env file in the project root with the following:

# Weaviate Configuration
WEAVIATE_URL=http://localhost:8080
WEAVIATE_GRPC_PORT=50051

# Embedding Provider Configuration
EMBEDDING_API_BASE_URL=http://localhost:3000
EMBEDDING_API_KEY=<your_embedding_api_key>
DEFAULT_EMBEDDING_MODEL=mxbai-embed-large

# LLM Provider Configuration
LLM_API_BASE_URL=http://localhost:3000
LLM_API_KEY=<your_llm_api_key>
DEFAULT_GENERATION_MODEL=gemma3

# Reranker Configuration
RERANKER_ENABLED=True
RERANKER_TYPE=cross_encoder
RERANKER_MODEL_NAME=cross-encoder/ms-marco-MiniLM-L-6-v2
RERANKER_INITIAL_TOP_K=10
RERANKED_TOP_K=5
RETRIEVAL_ALPHA=0.5
  • Replace <your_embedding_api_key> and <your_llm_api_key> with API keys from your provider (e.g., OpenWebUI).
  • Adjust URLs if your services run on different hosts/ports.
  • These settings assume a local Ollama instance managed via OpenWebUI at http://localhost:3000.
  • The reranker configuration controls the cross-encoder reranking process, which improves retrieval quality by reordering document chunks based on relevance to the query.

6. Run the API Server

Start the FastAPI server:

uv run uvicorn src.main:app --reload
  • The server runs at http://localhost:8000.
  • Access interactive API docs at http://localhost:8000/docs.

7. Run the Streamlit UI (Optional)

If you want to use the web interface, ensure the API server (step 6) is running, then start the Streamlit app:

streamlit run streamlit_app.py
  • The Streamlit app will typically be available at http://localhost:8501.

Usage

Interact with the tool via API endpoints or the Streamlit UI to upload documents, list them, and query the RAG pipeline.

Streamlit Web UI

After starting the Streamlit application (see "Getting Started" step 7), you can access the web UI in your browser (usually at http://localhost:8501). The UI provides an intuitive way to:

  • Upload new documents.
  • View a list of already ingested documents and their processing status.
  • Ask questions and receive answers based on the content of the uploaded documents.
  • Select specific documents to scope your queries.

The Streamlit UI communicates with the FastAPI backend to perform these operations.

Example curl Requests

Here are some example curl commands to interact with the API:

1. Upload a Text Document:

curl -X POST -F "file=@/path/to/your_notes.txt" http://localhost:8000/v1/upload
  • Replace /path/to/your_notes.txt with the actual path to your .txt file.
  • Ensure the file has the .txt extension.

2. Upload a Markdown Document (with spaces in filename):

curl -X POST -F "file=@/path/to/security policy notes.md" http://localhost:8000/v1/upload
  • Replace /path/to/security policy notes.md with the actual path to your .md file.
  • The double quotes around the -F argument value handle spaces or special characters in the file path.
  • Important: Do not add extra single quotes inside the double quotes (e.g., file=@'/path/...').

3. List All Ingested Documents:

curl http://localhost:8000/v1/documents

4. Query the Pipeline (Basic):

curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "What is the incident response plan?"}' \
     http://localhost:8000/v1/query
  • Replace the query text with your question.

5. Query the Pipeline (Scoped to Specific Documents):

curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "Summarize the key findings for doc-abc123456789", "document_ids": ["doc-abc123456789"]}' \
     http://localhost:8000/v1/query
  • Replace doc-abc123456789 with the actual document ID(s) you want to query against.

6. Query the Pipeline (Specifying a Model):

curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "Compare policy A and policy B", "model_name": "another-model"}' \
     http://localhost:8000/v1/query
  • Replace another-model with the specific generation model name you want to use (if different from the default configured in .env).

7. Query the Pipeline (Using a Custom System Prompt):

curl -X POST -H "Content-Type: application/json" \
     -d '{
       "query": "What are the key points about our data retention policy?",
       "prompt": "You are a helpful AI assistant. Based on the provided context: {context}, please answer the user\'s query concisely."
     }' \
     http://localhost:8000/v1/query
  • The prompt field allows you to override the default system prompt.
  • It's recommended to include {context} in your custom prompt so the retrieved document chunks can be injected.

Upload a Document

Upload a document for ingestion:

curl -X POST -F "file=@path/to/document.txt" http://localhost:8000/upload
  • Supported Formats: .md, .txt.
  • Response:
    {
      "document_id": "doc-abc123456789",
      "filename": "document.txt",
      "status": "processing",
      "message": "Document received and queued for ingestion."
    }
  • The document is processed in the background, chunked, embedded, and stored in Weaviate.

List Documents

Retrieve metadata for all ingested documents:

curl http://localhost:8000/documents
  • Response:
    {
      "documents": [
        {
          "document_id": "doc-abc123456789",
          "filename": "document.txt",
          "metadata": {"upload_time": "2023-11-01T12:00:00Z", "status": "completed"}
        }
      ],
      "total": 1
    }

Query the Pipeline

Submit a query to get an answer with sources:

curl -X POST -H "Content-Type: application/json" \
     -d '{"query": "What is the security policy?"}' \
     http://localhost:8000/query
  • Response:
    {
      "answer": "The security policy mandates strong passwords and encryption...",
      "sources": [
        {
          "document_id": "doc-abc123456789",
          "snippet": "All systems must enforce strong passwords...",
          "metadata": {"file_name": "document.txt", "chunk_index": 2}
        }
      ]
    }
  • Optionally, scope the query with document_ids:
    {
      "query": "What is the security policy?",
      "document_ids": ["doc-abc123456789"]
    }
  • Optionally, provide a custom system prompt to guide the LLM's response style and content. If using a custom prompt, it's highly recommended to include a {context} placeholder where the retrieved document chunks will be injected. For example:
    {
      "query": "What is the security policy?",
      "prompt": "You are an expert legal advisor. Based on the provided context: {context}, summarize the key legal implications."
    }

For detailed endpoint specs, see API Documentation. For component-level usage (e.g., ingestion, retrieval), refer to the Implementation Guides.

Dependency Management

This project uses uv for fast Python package management and adheres to the following dependency workflow:

  • pyproject.toml: This file is the authoritative source of truth for the project's direct dependencies and their version constraints. All new dependencies should be added here first.
  • requirements.txt: This file acts as a lock file. It is generated from pyproject.toml and contains the exact versions of all direct and transitive dependencies. This ensures reproducible builds across different environments.

Workflow for Managing Dependencies

  1. Adding a New Dependency:

    • Manually add the new package and its version specifier (e.g., new_package ~= "1.2.0") to the [project.dependencies] section of pyproject.toml.
    • Install the updated dependencies into your editable environment:
      uv pip install -e .
    • After confirming the new dependency works as expected, update the lock file:
      uv pip freeze > requirements.txt
    • Commit both pyproject.toml and the updated requirements.txt to version control.
  2. Setting Up in a New Environment:

    • For Development (Editable Install): Use pyproject.toml to set up an editable environment. This is generally preferred for development as it reflects changes in your local project code immediately.
      uv venv create  # Or your preferred venv creation method
      source .venv/bin/activate
      uv pip install -e .
    • For Reproducible/Production-like Builds: Use the requirements.txt lock file to ensure you get the exact same versions of all dependencies.
      uv venv create
      source .venv/bin/activate
      uv pip sync requirements.txt
  3. Updating Existing Dependencies:

    • To update a specific package, modify its version constraint in pyproject.toml.
    • Then, reinstall and update the lock file:
      uv pip install -e .
      uv pip freeze > requirements.txt
    • To update all dependencies to their latest allowed versions based on pyproject.toml constraints:
      # This command attempts to update packages while respecting pyproject.toml constraints
      # Note: `uv pip install -e . --upgrade` might be needed depending on specific scenarios
      # or manually resolve conflicts if they arise.
      uv pip install -e . --upgrade # Or carefully manage updates individually
      uv pip freeze > requirements.txt

By following this workflow, we maintain a clear definition of direct dependencies while ensuring consistent and reproducible environments through the lock file.

Testing

Run the test suite to verify functionality:

uv run pytest

Test Coverage

Generate a coverage report:

  1. Install test dependencies:
    uv pip install -e '.[test]'
  2. Run tests with coverage:
    uv run pytest --cov=src --cov-report=html
  3. Open htmlcov/index.html in a browser to view the report.

Project Structure

The current codebase includes:

  • src/: Core application code.
    • api/: FastAPI routes (routes.py), schemas (schemas.py), and utilities.
    • generation/: LLM interaction (ollama_llm.py), prompt formatting (prompt_formatter.py).
    • ingestion/: Document processing (ingestion_pipeline.py, loader.py, splitter.py), embedding (embedder.py), and vector storage (vector_store.py).
    • models/: Data models (document.py).
    • pipeline/: RAG pipeline orchestration (rag_pipeline.py).
    • retrieval/: Hybrid search logic (hybrid_search_retriever.py) and reranker implementation (reranker.py, reranker_interface.py).
    • utils/: Shared utilities (e.g., file_utils.py, log_config.py).
    • main.py: FastAPI app entry point.
    • config.py: Configuration loader.
    • dependencies.py: Dependency injection for services and components.
  • streamlit_app.py: The Streamlit web user interface application.
  • tests/: Comprehensive unit and integration tests.
  • docs/: Architecture and implementation guides.
  • pyproject.toml: Defines project metadata and core dependencies (the source of truth).
  • requirements.txt: A lock file generated from pyproject.toml for reproducible environments.
  • .env: Environment variables.
  • docker-compose.yml: Weaviate service definition.

How It Works

  1. Ingestion: Documents are uploaded via /upload, chunked (512 tokens, 100 overlap), embedded using mxbai-embed-large, and stored in Weaviate.
  2. Retrieval: Queries hit /query, triggering hybrid search in Weaviate to fetch relevant chunks.
  3. Reranking: Retrieved chunks are scored and reordered using a cross-encoder model (when enabled) to improve relevance. By default, uses cross-encoder/ms-marco-MiniLM-L-6-v2 to rerank the top initial results.
  4. Generation: The most relevant chunks (after reranking if enabled) are formatted into a prompt, and the LLM (e.g., gemma3) generates a response with citations.
  5. API: FastAPI serves all interactions, with background tasks for ingestion and error handling.

See the Architecture Document for a high-level design overview and future goals.

Troubleshooting

  • Weaviate Not Starting: Check Docker logs (docker-compose logs weaviate) and ensure port 8080 is free.
  • API Errors: Verify .env settings match your provider's URLs and keys. Check logs in logs/app.log.
  • Model Issues: Ensure your LLM/embedding provider is running and supports the configured models.
  • Reranker Errors: If you encounter issues with the reranker, try setting RERANKER_ENABLED=False to bypass it. First-time usage might be slow as the model needs to be downloaded.
  • Performance Concerns: The sentence-transformers library runs locally on the same machine as the application. First-time queries will have higher latency as models are loaded, but subsequent queries benefit from caching. Reranker models typically require 300MB-1GB of memory.

Next Steps

  • Evaluation: Implement PromptFoo-based evaluation (Guide 9) in tests/evaluations/.
  • Backlog: Implement the backlog items from the Backlog Document.
  • Security: Add API key authentication, JWT tokens, and rate limiting.
  • Enhancements: Add reranking, graph-based retrieval, or agent workflows as outlined in the architecture.
  • Optimization: Tune retrieval (alpha, top_k) and generation parameters via config.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages