Skip to content

ashuein/HealthRep

Repository files navigation

License: MIT Python 3.12+ FastAPI Vue 3 Vite scikit-learn Docker SQLite

Healthcare Data Platform

A working demonstration of production-grade healthcare data engineering: unified patient records, natural language query interface, and ML-powered clinical decision support.

Built for value-based care organizations that need to unify pharmacy claims, clinical labs, eligibility files, patient app data, and provider workflows into a single patient view.

What's Implemented

Component Description
Single Patient View Unified schema linking 5 data domains into one longitudinal record per patient
Text-to-Data Engine Natural language → SQL → execution → plain English summary (Anthropic, OpenAI, or Gemini)
Risk Scoring API ML model trained on synthetic data, served as a FastAPI endpoint
Universal Data Extractor Upload FHIR R4, HL7 v2, C-CDA, PDF files — deterministic parsing for healthcare standards, LLM-assisted extraction for unstructured formats
PII Detection & Data Quality Regex-based PHI/PII scanning and data quality monitoring
Shared LLM Settings App-wide provider/model/API-key configuration used by both Query Engine and Extractor
Architecture Dashboard Interactive Vue frontend showing the full system design
Wiki / User Guide 11-page browsable book (MkDocs Material) covering every component from a user perspective

Architecture

┌─────────────────────────────────────────────────┐
│  Vue 3 Frontend (Architecture Dashboard)         │
│  ├── Patient View    ├── Query Engine            │
│  ├── Universal Extractor  ├── LLM Settings       │
│  ├── Data Setup       ├── PII & Quality          │
│  └── Model API        └── Bucket Diagram          │
└──────────────────┬──────────────────────────────┘
                   │ HTTP
┌──────────────────▼──────────────────────────────┐
│  FastAPI Backend                                 │
│  ├── /api/v1/patient/{id}/view                   │
│  ├── /api/v1/query                               │
│  ├── /api/v1/predict/risk-score                  │
│  ├── /api/v1/ingest/* (extract, stage, load)     │
│  ├── /api/v1/privacy/*                           │
│  ├── /api/v1/quality/*                           │
│  └── /api/v1/setup/* (generate, build, samples)  │
└──────────────────┬──────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────┐
│  SQLite Database (1000 synthetic patients)        │
│  ├── patients     ├── pharmacy_claims             │
│  ├── lab_results  ├── eligibility                 │
│  ├── app_engagement  ├── conditions               │
│  └── allergies                                    │
└─────────────────────────────────────────────────┘

Quick Start

There are two supported run paths. Pick the one that fits your situation.

Path A — Docker (recommended for testers)

Requires only Docker Desktop. No Python, Node, or manual setup needed.

git clone <repo-url>
cd healthcare-data-platform
docker compose up --build

Docker will build three containers (backend, frontend, wiki), auto-generate the database on first run, and serve everything:

Service URL
Frontend app http://localhost:3000
API docs http://localhost:3000/docs
Wiki / User guide http://localhost:8080

Docker URLs to open

If you started the repo with Docker, use only these browser URLs:

  • App: http://localhost:3000
  • API docs: http://localhost:3000/docs
  • Wiki: http://localhost:8080

Do not use http://localhost:8000 in the Docker path. The backend container is internal-only and is reached through the frontend's reverse proxy. If you open localhost:8000 and see a generic nginx page, that page is not the Healthcare Data Platform app.

The database is stored in a named Docker volume (hdp-data) and persists across restarts. To regenerate, remove the volume: docker compose down -v.

No .env file is required. To use the Text-to-Data query engine, enter your LLM provider, model name, and API key directly in the frontend Query Engine panel.

Path B — Python launcher (for developers)

Requires Python 3.12+ and Node.js 20+.

git clone <repo-url>
cd healthcare-data-platform
python start.py

The launcher will:

  • Create .venv automatically if it does not exist
  • Install backend, wiki, and frontend packages when needed
  • Generate the database if it doesn't exist
  • Start the backend API on http://localhost:8000
  • Start the frontend on http://localhost:5173
  • Start the wiki book on http://localhost:8080

Press Ctrl+C to stop all services.

LLM configuration (both paths)

The app uses a shared LLM settings panel (accessible from the Architecture Dashboard) that configures the provider, model, and API key for all LLM-powered features: the Text-to-Data Query Engine and the Universal Extractor (for unstructured formats like PDF, CSV, generic JSON/XML). Configure once in the LLM Settings panel — no server-side configuration needed.

Supported providers: Anthropic (Claude), OpenAI (GPT), Google (Gemini), OpenRouter. Server-side environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, LLM_PROVIDER, LLM_MODEL) are supported as optional fallbacks only.

API Endpoints

GET /api/v1/patient/{patient_id}/view

Returns a complete longitudinal record: demographics, eligibility, pharmacy claims, lab results, engagement summary, weight trajectory, and risk score.

POST /api/v1/query

{ "question": "How many patients started Wegovy in Q1 2024?" }

Returns generated SQL, query results, plain English summary, and confidence score. Provider, model, and API key are sent via request headers from the frontend.

POST /api/v1/predict/risk-score

{
  "bmi": 36.5, "hba1c": 7.2, "age": 45,
  "logins_per_week": 1.5, "meal_compliance": 0.42, "months_enrolled": 3
}

Returns risk score, risk tier, predicted weight change, and clinical recommendation.

Tech Stack

Layer Technology
Frontend Vue 3 + Vite
Backend Python + FastAPI
Database SQLite
LLM Anthropic / OpenAI / Google Gemini (user-selected)
ML scikit-learn (Logistic Regression)
Containerization Docker Compose (3-service full stack)
Wiki MkDocs Material

Project Structure

├── backend/
│   ├── main.py                 # FastAPI app
│   ├── entrypoint.sh           # Docker startup (DB generation + server)
│   ├── Dockerfile
│   ├── routers/
│   │   ├── patient.py          # Single Patient View
│   │   ├── query.py            # Text-to-Data Engine
│   │   ├── predict.py          # Risk Score Prediction
│   │   ├── setup.py            # Data Setup + Sample Generation
│   │   ├── ingest.py           # Universal Extractor pipeline
│   │   ├── privacy.py          # PII Detection
│   │   └── quality.py          # Data Quality Monitor
│   ├── parsers/
│   │   ├── fhir_parser.py      # FHIR R4 (fhir.resources)
│   │   ├── hl7v2_parser.py     # HL7 v2 (hl7apy)
│   │   └── ccda_parser.py      # C-CDA (lxml)
│   ├── extraction/
│   │   ├── pipeline.py         # Format detection + routing
│   │   ├── llm_extractor.py    # LLM semantic extraction
│   │   └── schema_registry.py  # Target extraction schemas
│   ├── services/
│   │   ├── query_engine.py     # Multi-provider NL → SQL pipeline
│   │   ├── llm_client.py       # Shared LLM caller
│   │   ├── pii_detector.py     # PII/PHI regex scanner
│   │   ├── quality_monitor.py  # Data quality metrics
│   │   ├── schema_inspector.py # DB schema introspection
│   │   ├── sql_validator.py    # SQL safety validation
│   │   └── risk_model.py       # Model loading & inference
│   └── models/
│       └── schemas.py          # Pydantic models
├── frontend/
│   ├── Dockerfile
│   ├── nginx.conf              # SPA + API proxy config
│   └── src/
│       ├── App.vue
│       ├── lib/
│       │   └── llmSettings.js  # Shared LLM config helper
│       └── components/
│           ├── ArchitectureDiagram.vue
│           ├── LlmSettings.vue
│           ├── DataSetup.vue
│           ├── PatientView.vue
│           ├── QueryEngine.vue
│           ├── UniversalExtractor.vue
│           ├── ExtractionReview.vue
│           └── ModelAPI.vue
├── data/
│   ├── generate.py             # Synthetic data generator
│   ├── generate_samples.py     # Ingestion sample generator
│   ├── schema.sql              # Database schema
│   ├── samples/                # Generated ingestion samples
│   └── healthcare.db           # Generated database
├── notebooks/
│   └── risk_scoring_model.ipynb
├── wiki/                       # User-facing documentation (11 pages)
│   ├── Dockerfile              # Static wiki container
│   ├── index.md
│   ├── ...
│   └── 11-universal-extractor.md
├── docker-compose.yml          # Full 3-service stack
├── mkdocs.yml                  # Wiki book configuration
└── start.py                    # Python developer launcher

License

This project is licensed under the MIT License. See LICENSE for details.

Author

Ashutosh Kumar Pandey

About

Healthcare data platform. Synthetic data, query engine, risk scoring, and healthcare document ingestion.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors