Skip to content

codicecustode/ai-document-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 DOC ANALYZER

Doc Analyzer is an AI-powered document processing backend that extracts text from PDF files using OCR, generates summaries, and enables question-answering based on extracted content. The application is built with FastAPI, packaged using Docker, and deployed on AWS EC2 using a production-grade, image-based architecture that pulls versioned Docker images instead of code.


🚀 Features

  • Upload PDF documents via API
  • OCR text extraction using Tesseract
  • AI-based summarization and Q&A
  • FastAPI asynchronous architecture
  • Fully containerized using Docker
  • Image-based deployment with GitHub Actions
  • Runs on AWS EC2 with Docker Compose
  • Persistent document storage support

⚙️ Local Setup

  1. Clone the repository
    git clone https://github.com/codicecustode/doc-analyzer.git
    cd doc-analyzer

  2. Create and activate virtual environment
    Windows:
    python -m venv ai-doc-analyzer-venv
    ai-doc-analyzer-venv\Scripts\activate
    Linux / Mac:
    python3 -m venv ai-doc-analyzer-venv
    source ai-doc-analyzer-venv/bin/activate

  3. Install dependencies
    pip install -r requirements.txt

  4. Run FastAPI server
    uvicorn app.main:app --reload

  5. Access URLs
    Application: http://localhost:8000/api/v1
    Swagger Docs: http://localhost:8000/docs


🐳 Docker Setup (Local)

  1. Build image
    docker build -t doc-analyzer .

  2. Run container
    docker run -p 8000:8000 doc-analyzer

  3. Access application
    http://localhost:8000/api/v1


📦 Docker Hub Integration

  1. Login
    docker login

  2. Tag image
    docker tag doc-analyzer <username>/doc-analyzer:<tag>

  3. Push image
    docker push <username>/doc-analyzer:<tag>

  4. Pull anywhere
    docker pull <username>/doc-analyzer:<tag>

Images are automatically built and pushed via GitHub Actions on each push to main.


☁️ AWS EC2 Deployment (Production)

This application uses image-based deployments — EC2 never pulls source code, only Docker images.

  1. Launch EC2 Instance
    OS → Ubuntu 22.04
    Open Port 80 in inbound rules

  2. Connect to EC2
    ssh ubuntu@<EC2_PUBLIC_DNS>

  3. Install Docker & Docker Compose
    sudo apt update && sudo apt install -y docker.io docker-compose
    sudo usermod -aG docker ubuntu
    (logout and login again to apply permissions)

  4. Create deployment directory
    mkdir -p /home/ubuntu/doc-analyzer
    cd /home/ubuntu/doc-analyzer

  5. Create .env file and add required environment variables

    .env

    IMAGE_TAG=latest SECRET_KEY=<your_secret> GEMINI_API_KEY=<your_secret> PINECONE_API_KEY=<your_secret> OPENAI_API_KEY=<your_secret> GOOGLE_API_KEY=<your_secret> MONGO_DB_URI=<your_url> MONGO_DB_NAME=<your_db_name>

  6. Create docker-compose.yml in the SAME directory

    docker-compose.yml

    version: '3.8' services: doc-analyzer: image: /doc-analyzer:${IMAGE_TAG} restart: unless-stopped ports: - "80:8000" env_file: - .env volumes: - uploads_data:/usr/python/app/doc-analyzer/app/uploads

    volumes: uploads_data:

  7. Start the application
    docker-compose pull
    docker-compose up -d

  8. Verify running services
    docker-compose ps

  9. Access the application in browser
    http://<EC2_PUBLIC_DNS>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors