Doc Analyzer is an AI-powered document processing backend that extracts text from PDF files using OCR, generates summaries, and enables question-answering based on extracted content. The application is built with FastAPI, packaged using Docker, and deployed on AWS EC2 using a production-grade, image-based architecture that pulls versioned Docker images instead of code.
- Upload PDF documents via API
- OCR text extraction using Tesseract
- AI-based summarization and Q&A
- FastAPI asynchronous architecture
- Fully containerized using Docker
- Image-based deployment with GitHub Actions
- Runs on AWS EC2 with Docker Compose
- Persistent document storage support
-
Clone the repository
git clone https://github.com/codicecustode/doc-analyzer.git
cd doc-analyzer -
Create and activate virtual environment
Windows:
python -m venv ai-doc-analyzer-venv
ai-doc-analyzer-venv\Scripts\activate
Linux / Mac:
python3 -m venv ai-doc-analyzer-venv
source ai-doc-analyzer-venv/bin/activate -
Install dependencies
pip install -r requirements.txt -
Run FastAPI server
uvicorn app.main:app --reload -
Access URLs
Application:http://localhost:8000/api/v1
Swagger Docs:http://localhost:8000/docs
-
Build image
docker build -t doc-analyzer . -
Run container
docker run -p 8000:8000 doc-analyzer -
Access application
http://localhost:8000/api/v1
-
Login
docker login -
Tag image
docker tag doc-analyzer <username>/doc-analyzer:<tag> -
Push image
docker push <username>/doc-analyzer:<tag> -
Pull anywhere
docker pull <username>/doc-analyzer:<tag>
Images are automatically built and pushed via GitHub Actions on each push to main.
This application uses image-based deployments — EC2 never pulls source code, only Docker images.
-
Launch EC2 Instance
OS → Ubuntu 22.04
Open Port 80 in inbound rules -
Connect to EC2
ssh ubuntu@<EC2_PUBLIC_DNS> -
Install Docker & Docker Compose
sudo apt update && sudo apt install -y docker.io docker-compose
sudo usermod -aG docker ubuntu
(logout and login again to apply permissions) -
Create deployment directory
mkdir -p /home/ubuntu/doc-analyzer
cd /home/ubuntu/doc-analyzer -
Create
.envfile and add required environment variablesIMAGE_TAG=latest SECRET_KEY=<your_secret> GEMINI_API_KEY=<your_secret> PINECONE_API_KEY=<your_secret> OPENAI_API_KEY=<your_secret> GOOGLE_API_KEY=<your_secret> MONGO_DB_URI=<your_url> MONGO_DB_NAME=<your_db_name>
-
Create
docker-compose.ymlin the SAME directoryversion: '3.8' services: doc-analyzer: image: /doc-analyzer:${IMAGE_TAG} restart: unless-stopped ports: - "80:8000" env_file: - .env volumes: - uploads_data:/usr/python/app/doc-analyzer/app/uploads
volumes: uploads_data:
-
Start the application
docker-compose pull
docker-compose up -d -
Verify running services
docker-compose ps -
Access the application in browser
http://<EC2_PUBLIC_DNS>