A working demonstration of production-grade healthcare data engineering: unified patient records, natural language query interface, and ML-powered clinical decision support.
Built for value-based care organizations that need to unify pharmacy claims, clinical labs, eligibility files, patient app data, and provider workflows into a single patient view.
| Component | Description |
|---|---|
| Single Patient View | Unified schema linking 5 data domains into one longitudinal record per patient |
| Text-to-Data Engine | Natural language → SQL → execution → plain English summary (Anthropic, OpenAI, or Gemini) |
| Risk Scoring API | ML model trained on synthetic data, served as a FastAPI endpoint |
| Universal Data Extractor | Upload FHIR R4, HL7 v2, C-CDA, PDF files — deterministic parsing for healthcare standards, LLM-assisted extraction for unstructured formats |
| PII Detection & Data Quality | Regex-based PHI/PII scanning and data quality monitoring |
| Shared LLM Settings | App-wide provider/model/API-key configuration used by both Query Engine and Extractor |
| Architecture Dashboard | Interactive Vue frontend showing the full system design |
| Wiki / User Guide | 11-page browsable book (MkDocs Material) covering every component from a user perspective |
┌─────────────────────────────────────────────────┐
│ Vue 3 Frontend (Architecture Dashboard) │
│ ├── Patient View ├── Query Engine │
│ ├── Universal Extractor ├── LLM Settings │
│ ├── Data Setup ├── PII & Quality │
│ └── Model API └── Bucket Diagram │
└──────────────────┬──────────────────────────────┘
│ HTTP
┌──────────────────▼──────────────────────────────┐
│ FastAPI Backend │
│ ├── /api/v1/patient/{id}/view │
│ ├── /api/v1/query │
│ ├── /api/v1/predict/risk-score │
│ ├── /api/v1/ingest/* (extract, stage, load) │
│ ├── /api/v1/privacy/* │
│ ├── /api/v1/quality/* │
│ └── /api/v1/setup/* (generate, build, samples) │
└──────────────────┬──────────────────────────────┘
│
┌──────────────────▼──────────────────────────────┐
│ SQLite Database (1000 synthetic patients) │
│ ├── patients ├── pharmacy_claims │
│ ├── lab_results ├── eligibility │
│ ├── app_engagement ├── conditions │
│ └── allergies │
└─────────────────────────────────────────────────┘
There are two supported run paths. Pick the one that fits your situation.
Requires only Docker Desktop. No Python, Node, or manual setup needed.
git clone <repo-url>
cd healthcare-data-platform
docker compose up --buildDocker will build three containers (backend, frontend, wiki), auto-generate the database on first run, and serve everything:
| Service | URL |
|---|---|
| Frontend app | http://localhost:3000 |
| API docs | http://localhost:3000/docs |
| Wiki / User guide | http://localhost:8080 |
If you started the repo with Docker, use only these browser URLs:
- App:
http://localhost:3000 - API docs:
http://localhost:3000/docs - Wiki:
http://localhost:8080
Do not use http://localhost:8000 in the Docker path. The backend container is internal-only and is reached through the frontend's reverse proxy. If you open localhost:8000 and see a generic nginx page, that page is not the Healthcare Data Platform app.
The database is stored in a named Docker volume (hdp-data) and persists across restarts. To regenerate, remove the volume: docker compose down -v.
No .env file is required. To use the Text-to-Data query engine, enter your LLM provider, model name, and API key directly in the frontend Query Engine panel.
Requires Python 3.12+ and Node.js 20+.
git clone <repo-url>
cd healthcare-data-platform
python start.pyThe launcher will:
- Create
.venvautomatically if it does not exist - Install backend, wiki, and frontend packages when needed
- Generate the database if it doesn't exist
- Start the backend API on
http://localhost:8000 - Start the frontend on
http://localhost:5173 - Start the wiki book on
http://localhost:8080
Press Ctrl+C to stop all services.
The app uses a shared LLM settings panel (accessible from the Architecture Dashboard) that configures the provider, model, and API key for all LLM-powered features: the Text-to-Data Query Engine and the Universal Extractor (for unstructured formats like PDF, CSV, generic JSON/XML). Configure once in the LLM Settings panel — no server-side configuration needed.
Supported providers: Anthropic (Claude), OpenAI (GPT), Google (Gemini), OpenRouter. Server-side environment variables (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, LLM_PROVIDER, LLM_MODEL) are supported as optional fallbacks only.
Returns a complete longitudinal record: demographics, eligibility, pharmacy claims, lab results, engagement summary, weight trajectory, and risk score.
{ "question": "How many patients started Wegovy in Q1 2024?" }Returns generated SQL, query results, plain English summary, and confidence score. Provider, model, and API key are sent via request headers from the frontend.
{
"bmi": 36.5, "hba1c": 7.2, "age": 45,
"logins_per_week": 1.5, "meal_compliance": 0.42, "months_enrolled": 3
}Returns risk score, risk tier, predicted weight change, and clinical recommendation.
| Layer | Technology |
|---|---|
| Frontend | Vue 3 + Vite |
| Backend | Python + FastAPI |
| Database | SQLite |
| LLM | Anthropic / OpenAI / Google Gemini (user-selected) |
| ML | scikit-learn (Logistic Regression) |
| Containerization | Docker Compose (3-service full stack) |
| Wiki | MkDocs Material |
├── backend/
│ ├── main.py # FastAPI app
│ ├── entrypoint.sh # Docker startup (DB generation + server)
│ ├── Dockerfile
│ ├── routers/
│ │ ├── patient.py # Single Patient View
│ │ ├── query.py # Text-to-Data Engine
│ │ ├── predict.py # Risk Score Prediction
│ │ ├── setup.py # Data Setup + Sample Generation
│ │ ├── ingest.py # Universal Extractor pipeline
│ │ ├── privacy.py # PII Detection
│ │ └── quality.py # Data Quality Monitor
│ ├── parsers/
│ │ ├── fhir_parser.py # FHIR R4 (fhir.resources)
│ │ ├── hl7v2_parser.py # HL7 v2 (hl7apy)
│ │ └── ccda_parser.py # C-CDA (lxml)
│ ├── extraction/
│ │ ├── pipeline.py # Format detection + routing
│ │ ├── llm_extractor.py # LLM semantic extraction
│ │ └── schema_registry.py # Target extraction schemas
│ ├── services/
│ │ ├── query_engine.py # Multi-provider NL → SQL pipeline
│ │ ├── llm_client.py # Shared LLM caller
│ │ ├── pii_detector.py # PII/PHI regex scanner
│ │ ├── quality_monitor.py # Data quality metrics
│ │ ├── schema_inspector.py # DB schema introspection
│ │ ├── sql_validator.py # SQL safety validation
│ │ └── risk_model.py # Model loading & inference
│ └── models/
│ └── schemas.py # Pydantic models
├── frontend/
│ ├── Dockerfile
│ ├── nginx.conf # SPA + API proxy config
│ └── src/
│ ├── App.vue
│ ├── lib/
│ │ └── llmSettings.js # Shared LLM config helper
│ └── components/
│ ├── ArchitectureDiagram.vue
│ ├── LlmSettings.vue
│ ├── DataSetup.vue
│ ├── PatientView.vue
│ ├── QueryEngine.vue
│ ├── UniversalExtractor.vue
│ ├── ExtractionReview.vue
│ └── ModelAPI.vue
├── data/
│ ├── generate.py # Synthetic data generator
│ ├── generate_samples.py # Ingestion sample generator
│ ├── schema.sql # Database schema
│ ├── samples/ # Generated ingestion samples
│ └── healthcare.db # Generated database
├── notebooks/
│ └── risk_scoring_model.ipynb
├── wiki/ # User-facing documentation (11 pages)
│ ├── Dockerfile # Static wiki container
│ ├── index.md
│ ├── ...
│ └── 11-universal-extractor.md
├── docker-compose.yml # Full 3-service stack
├── mkdocs.yml # Wiki book configuration
└── start.py # Python developer launcher
This project is licensed under the MIT License. See LICENSE for details.
Ashutosh Kumar Pandey