A cloud‑native simulation of a real enterprise workflow: mainframe output → report generation (Excel/PDF) → cloud upload → ML-powered predictions → dashboard for download.
This project modernizes a legacy reporting pipeline using FastAPI, React, GCP Cloud Run, Cloud Storage, PostgreSQL, and Python automation — all built using mock data (fully safe and compliant).
- Upload mock claim files (CSV/TXT)
- Parse, clean, and validate data
- Generate Excel + PDF reports
- Upload reports to Google Cloud Storage
- View and download reports from a modern UI
- Predict claim outcomes (Paid/Denied) for pending claims
- Random Forest classification model
- Probability scores and confidence levels
- Interactive dashboard with visual predictions
- Real-time processing via React frontend
- Backend: FastAPI on Cloud Run
- Frontend: React dashboard with Tailwind CSS
- Storage: Cloud Storage buckets
- Database: Cloud SQL (PostgreSQL) or SQLite (default)
- Scheduled jobs via Cloud Scheduler
- Clean folder architecture
- ML model integration
- RESTful API design
- Error handling and validation
- CORS-enabled for frontend integration
Interactive dashboard showing ML predictions for pending claims with probability scores and confidence levels.
Client (React)
→ Backend API (FastAPI)
→ Data Processing (pandas)
→ Report Generators (Excel/PDF)
→ Cloud Storage (GCP)
→ Database (PostgreSQL)
Upload File → Parse & Clean → Aggregate Data
→ Excel/PDF Generation → Upload to GCP → Metadata Saved
→ Dashboard Download
ML Prediction Flow:
Upload CSV → Feature Engineering → Train Model
→ Predict Pending Claims → Display Results in Dashboard
- React
- Tailwind CSS
- Axios
- Vercel / Cloud Run deployment
- FastAPI
- Python (pandas, numpy, scikit-learn, openpyxl, reportlab)
- SQLAlchemy
- Pydantic
- Google Cloud Storage SDK
- Machine Learning (Random Forest Classifier)
- Cloud Run
- Cloud Storage
- Cloud SQL
- Cloud Scheduler
Claims-Reporting-Automation-System/
├── app/
│ ├── main.py # FastAPI application
│ ├── api/
│ │ └── reports.py # API endpoints (upload, predict, list)
│ ├── ml/
│ │ └── claims_predictor.py # ML model for predictions
│ ├── models/
│ │ └── report.py # SQLAlchemy Report model
│ ├── db/
│ │ ├── base.py # Database base
│ │ └── session.py # Database session
│ └── services/
│ ├── excel_generator.py # Excel/PDF report generation
│ ├── ml_service.py # ML service wrapper
│ └── storage.py # Google Cloud Storage
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ └── ClaimsPredictor.js # ML prediction UI
│ │ ├── api/
│ │ │ └── reports.js # API client
│ │ ├── App.js
│ │ └── index.js
│ ├── package.json
│ └── tailwind.config.js
├── sample_data/
│ └── claims_export_2025.txt # Sample claims data
├── requirements.txt # Python dependencies
└── README.md
Sample data is located in sample_data/claims_export_2025.txt:
ClaimID,Status,Amount,Date,Type
C001,Paid,550.25,2025-01-11,Medical
C002,Denied,200.00,2025-01-14,Dental
C003,Pending,130.50,2025-01-17,Vision
C004,Paid,1250.75,2025-01-18,Medical
...Required Columns:
ClaimID: Unique identifierStatus: One of "Paid", "Denied", or "Pending"Amount: Numeric claim amountDate: Date in YYYY-MM-DD formatType: Claim type (e.g., "Medical", "Dental", "Vision")
POST /reports/upload- Upload CSV file, generate Excel/PDF reports, upload to GCSPOST /reports/predict- Upload CSV file, get ML predictions for pending claimsGET /reports- List all processed reports
GET /health- Health check endpoint
GET /docs- Swagger UI (FastAPI auto-generated docs)GET /redoc- ReDoc documentation
Create .env file in project root:
# Database (optional - defaults to SQLite)
DATABASE_URL=postgresql://user:pass@host:port/db
# Google Cloud Storage (required for file uploads)
GCP_BUCKET_NAME=your-bucket-name
# Frontend API URL (optional)
REACT_APP_API_URL=http://localhost:8000Note: See SETUP_ENV.md for detailed setup instructions.
- Python 3.11+ with virtual environment
- Node.js 16+ and npm
- (Optional) PostgreSQL for production-like setup
-
Create and activate virtual environment:
python -m venv .venv .venv\Scripts\activate # Windows # or source venv/bin/activate # Linux/Mac
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
- Copy
.env.exampleto.env(if exists) - Set
DATABASE_URL(optional - defaults to SQLite) - Set
GCP_BUCKET_NAME(required for file uploads)
- Copy
-
Start the server:
uvicorn app.main:app --reload
Server runs at:
http://localhost:8000- API Docs:
http://localhost:8000/docs - Health Check:
http://localhost:8000/health
- API Docs:
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
npm install
-
Start development server:
npm start
Frontend runs at:
http://localhost:3000
- Start backend:
uvicorn app.main:app --reload - Start frontend:
cd frontend && npm start - Open
http://localhost:3000 - Upload
sample_data/claims_export_2025.txtfor ML predictions
- Build Docker image
- Push to Container Registry
- Deploy with Cloud SQL + Cloud Storage permissions
- Deploy to Vercel or Cloud Run
- Cloud Scheduler → calls
/reports/generateon cron
- ML Integration Guide - Complete guide for ML prediction feature
- Environment Setup - Detailed environment variable configuration
- Virtual Environment Guide - Python virtual environment setup
✅ CSV/TXT file upload and validation
✅ Excel and PDF report generation
✅ Google Cloud Storage integration
✅ ML-powered claims prediction
✅ Interactive React dashboard
✅ RESTful API with auto-generated docs
✅ SQLite (default) or PostgreSQL support
- Model persistence (save/load trained models)
- Prediction history in database
- Batch processing for large files
- Export predictions as CSV
- Role-based access control
- Email notifications
- Additional file formats
- Multi-file parallel processing
MIT License
PRs and improvements are welcome!
- Start both servers (backend and frontend)
- Upload sample data:
- Go to
http://localhost:3000 - Click "Select CSV/TXT File"
- Choose
sample_data/claims_export_2025.txt - Click "Predict Claims"
- Go to
- View results:
- See summary statistics
- Review predictions table with probabilities
- Check denial risk scores
Import errors (sklearn, numpy):
pip install scikit-learn==1.3.2 numpy==1.26.4CORS errors:
- Backend includes CORS middleware for
localhost:3000 - Check
app/main.pyfor CORS configuration
File not found:
- Ensure
sample_data/claims_export_2025.txtexists - Check file path in ML predictor script
Frontend connection errors:
- Verify backend is running on port 8000
- Check
REACT_APP_API_URLin frontend.env
See ML_INTEGRATION_GUIDE.md for detailed troubleshooting.
This project is a modern cloud implementation inspired by the typical mainframe → reporting → SharePoint workflow used in enterprise environments, redesigned using modern engineering practices for learning and demonstration.
Built with: FastAPI • React • Tailwind CSS • scikit-learn • pandas • PostgreSQL/SQLite • Google Cloud Storage

