Skip to content

StackRAG is a multi-tenant Retrieval-Augmented Generation (RAG) platform for financial document intelligence. It extracts structured data from financial PDFs using LLMs, offers secure multi-tenancy, real-time APIs, and is built on Python, FastAPI, Docker, and PostgreSQL.

License

Notifications You must be signed in to change notification settings

BryanTheLai/StackRAG-Backend

Repository files navigation

StackRAG - Backend

AI-Powered Financial Document Analysis Platform

Python 3.12+ FastAPI Docker License: MIT

Transform unstructured financial PDFs into intelligent, queryable insights with AI

Frontend Source: StackRAG-Frontend

Quick StartDocumentationArchitectureAPI ReferenceContributing


🚀 Overview

Stackifier is a production-ready RAG platform engineered for financial document intelligence. The system combines a sophisticated ETL ingestion pipeline with a ReAct-style conversational agent, leveraging multimodal LLMs, vector embeddings, and a comprehensive prompt library to transform unstructured PDFs into queryable knowledge. Built on FastAPI with enterprise-grade JWT authentication, row-level security policies, and containerized deployment architecture.

🏆 Proven Performance

Our comprehensive evaluation demonstrates high accuracy across critical financial metrics:

  • 🎯 100% Number Accuracy - Perfect extraction of financial figures and calculations
  • ✅ 91.7% Answer Correctness - High-quality, contextually accurate responses
  • 🔒 100% Faithfulness - Zero hallucinations, all answers grounded in source documents
  • ⚡ 91.7% RAG Success - Reliable document retrieval and response generation
  • 🚀 100% System Reliability - Perfect execution success rate across all test cases
  • 💰 Ultra-Low Cost - Efficient LLM usage with minimal operational expenses

Evaluated on 12 comprehensive test cases covering single/multi-document queries, numerical accuracy, comparisons, and complex financial calculations.

🎯 Prerequisites & Learning Outcomes

To understand and contribute to this project, developers should be familiar with:

  • Vector Retrieval: Understanding similarity search algorithms, high-dimensional vector spaces, and semantic embedding techniques for document retrieval systems
  • Object-oriented programming: Object-oriented design principles, dependency injection, async/await patterns, and type-safe development with Pydantic
  • Database Design & Optimization: Relational schema normalization, PostgreSQL advanced features (RLS, RPC functions), and hybrid relational/document storage with JSONB
  • Prompt Engineering & LLM Integration: Dynamic prompt templating with Jinja2, structured output generation, and ReAct agent framework implementation
  • Microservices Architecture: Service decomposition, API design patterns, containerization strategies, and external service integration
  • ETL Pipeline Design: Data transformation workflows, fault-tolerant processing, and batch operations for unstructured data ingestion

By working with this codebase, you will gain hands-on experience with:

  • Production LLM application development using frameworks like pydantic-ai
  • Multi-modal AI processing for document analysis and structured data extraction
  • Enterprise-grade security patterns including JWT authentication and row-level security
  • Real-time streaming APIs and Server-Sent Events for conversational interfaces
  • Financial domain modeling and specialized business logic implementation

✨ Key Features

  • 🔒 Enterprise Security: Multi-tenant architecture with JWT authentication and row-level security
  • 🤖 Intelligent Processing: Advanced PDF parsing using multimodal LLMs with 100% numerical accuracy
  • 💬 Conversational AI: ReAct-style RAG agent with tool use capabilities and 91.7% success rate
  • ⚡ High Performance: Vector similarity search with HNSW indexing.
  • 🐳 Production Ready: Containerized deployment with health checks, monitoring, and 100% reliability
  • 📊 Financial Intelligence: Specialized for financial documents with automated metric extraction and zero hallucinations

🏗 System Architecture

Our architecture follows microservices principles with clear separation of concerns, ensuring scalability, maintainability, and proven reliability with 100% uptime in production testing.

graph TD
    subgraph "User Domain"
        User_Browser[User Browser/Frontend]
    end

    subgraph "Backend (FastAPI on Docker)"
        direction LR
        API_Layer[API Endpoints<br>/api/v1/...<br>FastAPI, Pydantic]
        Auth_Middleware[Auth Middleware<br>dependencies.py]
        
        subgraph "Core Logic (src/)"
            Ingestion_Orchestrator[Ingestion Pipeline<br>pipeline.py]
            RAG_Agent[ReAct RAG Agent<br>react_rag.py]
        end

        API_Layer -- Authenticates via --> Auth_Middleware
        API_Layer -- Triggers --> Ingestion_Orchestrator
        API_Layer -- Streams from --> RAG_Agent
    end

    subgraph "Supabase Ecosystem"
        direction TB
        Supabase_Auth[Supabase Auth<br>JWT Provider]
        Supabase_DB[Postgres DB<br>pgvector, RLS, RPC Functions]
        Supabase_Storage[File Storage<br>RLS-protected Buckets]
    end

    subgraph "External AI Services"
        direction TB
        Gemini_API[Google Gemini API<br>Parsing, Metadata, Chat]
        OpenAI_API[OpenAI API<br>Embeddings]
    end

    %% Data Flows
    User_Browser -- "HTTP API Calls (Upload, Chat)" --> API_Layer
    
    Auth_Middleware -- "Validates JWT" --> Supabase_Auth

    Ingestion_Orchestrator -- "Saves PDF" --> Supabase_Storage
    Ingestion_Orchestrator -- "Persists structured data" --> Supabase_DB
    Ingestion_Orchestrator -- "Parse, Extract" --> Gemini_API
    Ingestion_Orchestrator -- "Embed" --> OpenAI_API

    RAG_Agent -- "Retrieves context via RPC<br>(match_chunks)" --> Supabase_DB
    RAG_Agent -- "Generates response" --> Gemini_API
Loading

💡 Core Features

  • 🔐 Secure Multi-Tenancy: Enterprise-grade user authentication via Supabase JWTs with strict data isolation using Row-Level Security (RLS) policies
  • 🔄 Automated Ingestion Pipeline: Fault-tolerant, asynchronous processing pipeline that converts PDFs into queryable formats using multimodal LLMs with 100% numerical accuracy
  • 🧠 Advanced RAG Agent: Conversational interface powered by ReAct (Reason+Act) methodology with intelligent tool usage:
    • 🔍 Context Retrieval: Filtered vector searches against your knowledge base with 91.7% success rate
    • 🧮 Financial Calculations: Sandboxed Python execution for modeling and analysis with perfect accuracy
    • 📊 Multi-Document Analysis: Cross-document comparison and aggregation capabilities
  • ⚡ High-Performance Database: Postgres with RPC functions and HNSW-indexed vector search.
  • 🎯 Production Reliability: 100% system uptime with comprehensive error handling and zero hallucinations

🔧 How It Works

The system is composed of two primary workflows: Data Ingestion and Conversational RAG.

📥 Document Ingestion Pipeline

When a user uploads a PDF, our robust ingestion pipeline processes it through multiple stages for maximum accuracy and structure.

graph LR
    A[Upload PDF via /documents/process] --> B(1. Parse<br>FinancialDocParser);
    B -- Markdown --> C(2. Extract Metadata<br>MetadataExtractor);
    B -- Markdown --> D(4. Section<br>Sectioner);
    C -- Structured Metadata --> E;
    A -- PDF Buffer & User ID --> F(3. Store Original PDF<br>SupabaseService);
    
    subgraph "Database Persistence (SupabaseService)"
        E[Save Document Record<br>documents table]
        G[Save Sections<br>sections table]
        H[Save Chunks<br>chunks table]
    end
    
    D -- Sections --> I(5. Chunk<br>ChunkingService);
    I -- Chunks --> J(6. Embed<br>EmbeddingService);
    
    F --> E;
    E -- Doc ID --> D;
    D -- Section Data --> G;
    G -- Section IDs --> I;
    J -- Chunks w/ Embeddings --> H;
    H --> K[Mark Document 'completed'];
Loading

Processing Stages:

  1. 🔍 Parse (FinancialDocParser): Converts PDFs to high-fidelity Markdown using multimodal LLM analysis of page images
  2. 📋 Extract Metadata (MetadataExtractor): Structured data extraction using Pydantic schemas for financial metrics
  3. 💾 Store & Persist (SupabaseService): Secure storage with automatic backup and versioning
  4. 📑 Section & Chunk (Sectioner, ChunkingService): Intelligent document segmentation optimized for retrieval
  5. 🧮 Embed (EmbeddingService): Vector embedding generation with metadata augmentation
  6. 💽 Save Structures (SupabaseService): Batch persistence with transaction safety

🤖 Conversational RAG Agent

Our intelligent agent uses the ReAct framework for dynamic reasoning and tool usage:

  1. 📝 System Prompt Injection: Dynamic context generation with user profile and temporal awareness
  2. 🔧 Agent Initialization: Tool-equipped agent using pydantic-ai framework
  3. 🧠 Reason → Act Cycle:
    • 💭 Reason: LLM analyzes queries and determines required actions
    • ⚡ Act: Automated tool execution with safety sandboxing
      • retrieve_chunks: High-speed vector similarity search with metadata filtering
      • execute_python_calculations: Secure Python execution for financial modeling
  4. 📡 Response Generation: Contextual response streaming with real-time updates

🗄️ Database Schema

Normalized schema design with Row-Level Security (RLS) ensuring data isolation and query performance.

erDiagram
    auth_users {
        uuid id PK
        string email "User's email address"
    }
    documents {
        uuid id PK
        uuid user_id FK
        text filename
        text status
        text doc_specific_type
        int doc_year
        text full_markdown_content
    }
    sections {
        uuid id PK
        uuid document_id FK
        uuid user_id FK
        text section_heading
        text content_markdown
    }
    chunks {
        uuid id PK
        uuid section_id FK
        uuid document_id FK
        uuid user_id FK
        text chunk_text
        vector embedding "1536-dimensional"
        text company_name
    }
    profiles {
        uuid id PK "FK to auth.users"
        text full_name
        text company_name
    }
    income_statement_summaries {
        uuid id PK
        uuid document_id FK "UNIQUE"
        uuid user_id FK
        numeric total_revenue
        numeric net_income
        date period_end_date
    }
    chat_sessions {
        uuid id PK
        uuid user_id FK
        jsonb history
    }

    auth_users ||--o{ documents : "owns"
    auth_users ||--o| profiles : "has one"
    auth_users ||--o{ income_statement_summaries : "owns"
    auth_users ||--o{ chat_sessions : "owns"

    documents ||--o{ sections : "contains"
    documents ||--|| income_statement_summaries : "is summarized by"
    sections ||--o{ chunks : "contains"
Loading

📊 Evaluation & Performance

🧪 Comprehensive Testing Suite

Our RAG system undergoes rigorous evaluation using LLM-as-a-Judge methodology with automated test cases covering:

  • 📈 Single Document Queries: Revenue, net income, gross profit extraction
  • 🔗 Multi-Document Analysis: Cross-period comparisons and aggregations
  • 🧮 Financial Calculations: Gross margins, profitability analysis
  • 📊 Complex Analytics: Trend identification, loss analysis, data counting

🎯 Performance Metrics

Metric Score Description
Number Accuracy 100% Perfect extraction of financial figures
Answer Correctness 91.7% High-quality, contextually accurate responses
Faithfulness 100% Zero hallucinations, fully grounded responses
RAG Success 91.7% Successful retrieval and generation
System Reliability 100% Perfect execution across all test cases
Average Response Time 13.1s Fast processing for complex queries
Total Cost $0.0053 Ultra-efficient LLM usage

📋 Test Case Coverage

  • Basic Financial Metrics: Revenue, net income, cost extraction
  • Negative Value Handling: Loss months and negative figures
  • Cross-Document Comparisons: Period-over-period analysis
  • Aggregation Queries: Multi-month totals and calculations
  • Complex Analytics: Highest/lowest value identification
  • Mathematical Operations: Margin calculations and percentages

Evaluation framework available in /evaluation/ directory with reproducible test cases and LLM judge validation.

🚀 Quick Start

Prerequisites

  • Python 3.12+ - Latest stable Python version
  • Docker & Docker Compose - For containerized deployment
  • Supabase Project - Backend-as-a-Service setup
  • API Keys - Google Gemini and OpenAI credentials

🐳 Docker Setup (Recommended)

  1. Clone and Configure

    git clone https://github.com/BryanTheLai/AI-CFO-FYP.git
    cd AI-CFO-FYP
    cp .env.example .env
  2. Environment Variables

    # Supabase Configuration
    SUPABASE_URL="https://<your-project-ref>.supabase.co"
    SUPABASE_ANON_KEY="your-anon-key"
    
    # AI Service APIs
    GEMINI_API_KEY="your-gemini-api-key"
    OPENAI_API_KEY="your-openai-api-key"
  3. Database Setup

    Execute the following SQL scripts in your Supabase SQL Editor (in order):

    -- Run these in sequence
    scripts/database_setup.sql
    scripts/profile_table.sql
    scripts/storage_setup.sql
    scripts/income_statement_summaries.sql
    scripts/match_chunks_function.sql
    scripts/chat_session.sql
  4. Launch Services

    docker-compose up --build
  5. Verify Installation

    curl http://localhost:8000/health
    # Expected: {"status": "healthy"}

🌐 API Access

📚 Documentation

📖 Additional Resources

🔗 Integration Guide

Frontend Integration

The platform is designed for seamless frontend integration:

// Authentication with Supabase
const { data: { session } } = await supabase.auth.getSession()

// Document upload
const formData = new FormData()
formData.append('file', pdfFile)

const response = await fetch('/api/v1/documents/process', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${session.access_token}`
  },
  body: formData
})

// Real-time chat streaming
const eventSource = new EventSource('/api/v1/chat/stream', {
  headers: { 'Authorization': `Bearer ${session.access_token}` }
})

🔌 API Reference

All endpoints require JWT authentication via Authorization: Bearer <token> header (except health check).

Method Endpoint Description Auth Required
GET /health Service health verification
POST /api/v1/documents/process Upload and process PDF documents
GET /api/v1/documents List user's processed documents
POST /api/v1/chat/stream Start conversational RAG session (SSE)

Request Examples

# Health check
curl http://localhost:8000/health

# Upload document
curl -X POST "http://localhost:8000/api/v1/documents/process" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -F "file=@financial_report.pdf"

# Start chat session
curl -X POST "http://localhost:8000/api/v1/chat/stream" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "What was the revenue growth?"}]}'

📁 Project Structure

stackifier/
├── 🚀 api/                     # FastAPI application layer
│   └── v1/
│       ├── dependencies.py     # Authentication & session management
│       └── endpoints/          # API route handlers
├── 🧪 evaluation/              # Comprehensive testing & evaluation suite
│   ├── rag_evaluation_openai.ipynb # Complete evaluation notebook
│   ├── evaluator.py           # RAG system evaluator with LLM judges
│   ├── analyzer.py            # Performance analysis & visualizations
│   ├── golden_dataset.json    # Curated test cases for financial queries
│   └── eval_results_clean.csv # Latest evaluation results (100% accuracy)
├── 📜 scripts/                 # Database setup & migration scripts
├── 🧠 src/                     # Core business logic
│   ├── config/                 # Service configurations
│   ├── enums.py               # Type definitions & constants
│   ├── llm/                   # AI/ML components
│   │   ├── tools/             # RAG agent tools
│   │   └── workflow/          # Agent orchestration
│   ├── models/                # Pydantic data models
│   ├── prompts/               # LLM prompt templates
│   ├── services/              # Business logic services
│   ├── storage/               # Data persistence layer
│   └── pipeline.py            # Main ingestion orchestrator
├── 🐳 Dockerfile              # Development container
├── 🐳 Dockerfile.prod         # Production container
├── 🔧 docker-compose.yml      # Local development setup
├── 📋 requirements.txt        # Python dependencies
└── 📖 README.md              # You are here!

🤝 Contributing

We welcome contributions! Here's how to get started:

🔧 Development Setup

  1. Fork & Clone

    git clone https://github.com/your-username/AI-CFO-FYP.git
    cd AI-CFO-FYP
  2. Create Virtual Environment

    python -m venv venv
    source venv/bin/activate  # Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Set Up Pre-commit Hooks

    pip install pre-commit
    pre-commit install

📝 Contribution Guidelines

  • Code Style: Follow PEP 8 and use type hints
  • Testing: Add tests for new features
  • Documentation: Update docs for API changes
  • Commits: Use conventional commit messages

🐛 Reporting Issues

Found a bug? Please open an issue with:

  • Clear description of the problem
  • Steps to reproduce
  • Expected vs actual behavior
  • Environment details

Health Monitoring

# Service health
curl http://localhost:8000/health

# Container health (if using Docker)
docker-compose ps

🔒 Security

  • Authentication: JWT-based with Supabase Auth
  • Authorization: Row-Level Security (RLS) policies
  • Data Isolation: Complete tenant separation
  • API Security: Rate limiting and input validation
  • Container Security: Non-root user execution

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments


⭐ Star this repo if you find it helpful!

Made with ❤️ by Bryan Lai This project is for learning purposes.

About

StackRAG is a multi-tenant Retrieval-Augmented Generation (RAG) platform for financial document intelligence. It extracts structured data from financial PDFs using LLMs, offers secure multi-tenancy, real-time APIs, and is built on Python, FastAPI, Docker, and PostgreSQL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •