Healthcare Data Platform

A working demonstration of production-grade healthcare data engineering: unified patient records, natural language query interface, and ML-powered clinical decision support.

Built for value-based care organizations that need to unify pharmacy claims, clinical labs, eligibility files, patient app data, and provider workflows into a single patient view.

What's Implemented

Component	Description
Single Patient View	Unified schema linking 5 data domains into one longitudinal record per patient
Text-to-Data Engine	Natural language → SQL → execution → plain English summary (Anthropic, OpenAI, or Gemini)
Risk Scoring API	ML model trained on synthetic data, served as a FastAPI endpoint
Universal Data Extractor	Upload FHIR R4, HL7 v2, C-CDA, PDF files — deterministic parsing for healthcare standards, LLM-assisted extraction for unstructured formats
PII Detection & Data Quality	Regex-based PHI/PII scanning and data quality monitoring
Shared LLM Settings	App-wide provider/model/API-key configuration used by both Query Engine and Extractor
Architecture Dashboard	Interactive Vue frontend showing the full system design
Wiki / User Guide	11-page browsable book (MkDocs Material) covering every component from a user perspective

Architecture

┌─────────────────────────────────────────────────┐
│  Vue 3 Frontend (Architecture Dashboard)         │
│  ├── Patient View    ├── Query Engine            │
│  ├── Universal Extractor  ├── LLM Settings       │
│  ├── Data Setup       ├── PII & Quality          │
│  └── Model API        └── Bucket Diagram          │
└──────────────────┬──────────────────────────────┘
                   │ HTTP
┌──────────────────▼──────────────────────────────┐
│  FastAPI Backend                                 │
│  ├── /api/v1/patient/{id}/view                   │
│  ├── /api/v1/query                               │
│  ├── /api/v1/predict/risk-score                  │
│  ├── /api/v1/ingest/* (extract, stage, load)     │
│  ├── /api/v1/privacy/*                           │
│  ├── /api/v1/quality/*                           │
│  └── /api/v1/setup/* (generate, build, samples)  │
└──────────────────┬──────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────┐
│  SQLite Database (1000 synthetic patients)        │
│  ├── patients     ├── pharmacy_claims             │
│  ├── lab_results  ├── eligibility                 │
│  ├── app_engagement  ├── conditions               │
│  └── allergies                                    │
└─────────────────────────────────────────────────┘

Quick Start

There are two supported run paths. Pick the one that fits your situation.

Path A — Docker (recommended for testers)

Requires only Docker Desktop. No Python, Node, or manual setup needed.

git clone <repo-url>
cd healthcare-data-platform
docker compose up --build

Docker will build three containers (backend, frontend, wiki), auto-generate the database on first run, and serve everything:

Service	URL
Frontend app	http://localhost:3000
API docs	http://localhost:3000/docs
Wiki / User guide	http://localhost:8080

Docker URLs to open

If you started the repo with Docker, use only these browser URLs:

App: http://localhost:3000
API docs: http://localhost:3000/docs
Wiki: http://localhost:8080

Do not use http://localhost:8000 in the Docker path. The backend container is internal-only and is reached through the frontend's reverse proxy. If you open localhost:8000 and see a generic nginx page, that page is not the Healthcare Data Platform app.

The database is stored in a named Docker volume (hdp-data) and persists across restarts. To regenerate, remove the volume: docker compose down -v.

No .env file is required. To use the Text-to-Data query engine, enter your LLM provider, model name, and API key directly in the frontend Query Engine panel.

Path B — Python launcher (for developers)

Requires Python 3.12+ and Node.js 20+.

git clone <repo-url>
cd healthcare-data-platform
python start.py

The launcher will:

Create .venv automatically if it does not exist
Install backend, wiki, and frontend packages when needed
Generate the database if it doesn't exist
Start the backend API on http://localhost:8000
Start the frontend on http://localhost:5173
Start the wiki book on http://localhost:8080

Press Ctrl+C to stop all services.

LLM configuration (both paths)

The app uses a shared LLM settings panel (accessible from the Architecture Dashboard) that configures the provider, model, and API key for all LLM-powered features: the Text-to-Data Query Engine and the Universal Extractor (for unstructured formats like PDF, CSV, generic JSON/XML). Configure once in the LLM Settings panel — no server-side configuration needed.

Supported providers: Anthropic (Claude), OpenAI (GPT), Google (Gemini), OpenRouter. Server-side environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, LLM_PROVIDER, LLM_MODEL) are supported as optional fallbacks only.

API Endpoints

GET `/api/v1/patient/{patient_id}/view`

Returns a complete longitudinal record: demographics, eligibility, pharmacy claims, lab results, engagement summary, weight trajectory, and risk score.

POST `/api/v1/query`

{ "question": "How many patients started Wegovy in Q1 2024?" }

Returns generated SQL, query results, plain English summary, and confidence score. Provider, model, and API key are sent via request headers from the frontend.

POST `/api/v1/predict/risk-score`

{
  "bmi": 36.5, "hba1c": 7.2, "age": 45,
  "logins_per_week": 1.5, "meal_compliance": 0.42, "months_enrolled": 3
}

Returns risk score, risk tier, predicted weight change, and clinical recommendation.

Tech Stack

Layer	Technology
Frontend	Vue 3 + Vite
Backend	Python + FastAPI
Database	SQLite
LLM	Anthropic / OpenAI / Google Gemini (user-selected)
ML	scikit-learn (Logistic Regression)
Containerization	Docker Compose (3-service full stack)
Wiki	MkDocs Material

Project Structure

├── backend/
│   ├── main.py                 # FastAPI app
│   ├── entrypoint.sh           # Docker startup (DB generation + server)
│   ├── Dockerfile
│   ├── routers/
│   │   ├── patient.py          # Single Patient View
│   │   ├── query.py            # Text-to-Data Engine
│   │   ├── predict.py          # Risk Score Prediction
│   │   ├── setup.py            # Data Setup + Sample Generation
│   │   ├── ingest.py           # Universal Extractor pipeline
│   │   ├── privacy.py          # PII Detection
│   │   └── quality.py          # Data Quality Monitor
│   ├── parsers/
│   │   ├── fhir_parser.py      # FHIR R4 (fhir.resources)
│   │   ├── hl7v2_parser.py     # HL7 v2 (hl7apy)
│   │   └── ccda_parser.py      # C-CDA (lxml)
│   ├── extraction/
│   │   ├── pipeline.py         # Format detection + routing
│   │   ├── llm_extractor.py    # LLM semantic extraction
│   │   └── schema_registry.py  # Target extraction schemas
│   ├── services/
│   │   ├── query_engine.py     # Multi-provider NL → SQL pipeline
│   │   ├── llm_client.py       # Shared LLM caller
│   │   ├── pii_detector.py     # PII/PHI regex scanner
│   │   ├── quality_monitor.py  # Data quality metrics
│   │   ├── schema_inspector.py # DB schema introspection
│   │   ├── sql_validator.py    # SQL safety validation
│   │   └── risk_model.py       # Model loading & inference
│   └── models/
│       └── schemas.py          # Pydantic models
├── frontend/
│   ├── Dockerfile
│   ├── nginx.conf              # SPA + API proxy config
│   └── src/
│       ├── App.vue
│       ├── lib/
│       │   └── llmSettings.js  # Shared LLM config helper
│       └── components/
│           ├── ArchitectureDiagram.vue
│           ├── LlmSettings.vue
│           ├── DataSetup.vue
│           ├── PatientView.vue
│           ├── QueryEngine.vue
│           ├── UniversalExtractor.vue
│           ├── ExtractionReview.vue
│           └── ModelAPI.vue
├── data/
│   ├── generate.py             # Synthetic data generator
│   ├── generate_samples.py     # Ingestion sample generator
│   ├── schema.sql              # Database schema
│   ├── samples/                # Generated ingestion samples
│   └── healthcare.db           # Generated database
├── notebooks/
│   └── risk_scoring_model.ipynb
├── wiki/                       # User-facing documentation (11 pages)
│   ├── Dockerfile              # Static wiki container
│   ├── index.md
│   ├── ...
│   └── 11-universal-extractor.md
├── docker-compose.yml          # Full 3-service stack
├── mkdocs.yml                  # Wiki book configuration
└── start.py                    # Python developer launcher

License

This project is licensed under the MIT License. See LICENSE for details.

Author

Ashutosh Kumar Pandey

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Healthcare Data Platform

What's Implemented

Architecture

Quick Start

Path A — Docker (recommended for testers)

Docker URLs to open

Path B — Python launcher (for developers)

LLM configuration (both paths)

API Endpoints

GET `/api/v1/patient/{patient_id}/view`

POST `/api/v1/query`

POST `/api/v1/predict/risk-score`

Tech Stack

Project Structure

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
data		data
frontend		frontend
notebooks		notebooks
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
healthcare-data-platform.jsx		healthcare-data-platform.jsx
mkdocs.yml		mkdocs.yml
start.py		start.py

Folders and files

Latest commit

History

Repository files navigation

Healthcare Data Platform

What's Implemented

Architecture

Quick Start

Path A — Docker (recommended for testers)

Docker URLs to open

Path B — Python launcher (for developers)

LLM configuration (both paths)

API Endpoints

GET /api/v1/patient/{patient_id}/view

POST /api/v1/query

POST /api/v1/predict/risk-score

Tech Stack

Project Structure

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GET `/api/v1/patient/{patient_id}/view`

POST `/api/v1/query`

POST `/api/v1/predict/risk-score`

Packages