Skip to content

CUB-CORR/cohort-builder

Repository files navigation

CORR Cohort Builder

Democratizing Access to Clinical Data for Retrospective Research

The CORR Cohort Builder is a designed to bridge the gap between high-dimensional intensive care unit data and clinical researchers.

🚀 Key Features

🏥 Interactive Cohort Definition

  • Define inclusion and exclusion criteria using a visual logic builder.
  • Filter patients based on demographics, clinical events, or temporal logic (e.g., "Sepsis within 24h of admission").

📊 Advanced Variable Management**

  • Seamlessly integrates pre-validated clinical concepts from the CORR-Vars library.
  • Supports both Native (raw DB extraction) and Derived (calculated via Python/Polars) variables.
  • Allows customization of time windows and aggregation methods (e.g., min, max, mean) directly in the UI.

📈 Instant Feasibility & Analytics**

  • Real-time feedback on cohort size and attrition.
  • Automated generation of a publication-ready "Table One" (baseline characteristics).
  • Integrated data profiling reports (distributions, missingness, correlations).

🔒 Security & Governance**

  • Offline Mode: Allows safe feasibility checks on an intranet server without direct database access.
  • Online Mode: Enables authorized extraction of full datasets on secure research servers.
  • Project-based access control (Owner, Editor, Read-Only).

🛠️ Technical Architecture

The application is built on a scalable microservices architecture to separate user interaction from heavy data processing.

Component Technology Description
Frontend Streamlit Reactive web interface for visual configuration. Handles state management and input validation.
Backend FastAPI Central REST API managing metadata persistence, authentication (JWT), and orchestration.
Executor Python Worker Async engine that runs heavy data queries. It translates visual definitions into executable CORR-Vars code.
Queue Redis Manages async job distribution between the Backend and Executor to prevent blocking the UI.
Database SQLite Stores project metadata, cohort definitions, and user settings.

📂 Repository Overview

The codebase is organized by service, facilitating independent development and deployment.

  • streamlit/: Frontend Application. Contains the UI layout, plotting logic, and session state management.
  • fastapi/: Backend API. Contains API routes, Pydantic models, and database repositories.
  • cohort_executor/: Execution Engine. Contains the logic for translating configurations to CORR-Vars objects and executing queries.
  • shared/: Shared Library. Common schemas, enums, and utilities used across all services.
  • docker-compose.yml: Deployment Config. Defines the multi-container setup for Docker.

🐳 Deployment with Docker

Prerequisites

1. Clone the Repository

git clone https://github.com/CUB-CORR/cohort-builder.git
cd cohort-builder

2. Configuration

Create a .env file in the root directory. You can start by copying the example configuration:

cp .env.example .env

Critical Environment Variables:

  • SECRET_KEY: Set a secure random string for JWT token generation.
  • DB_CONNECTION_STRING: (Optional) Connection string for the metadata database. Defaults to a local SQLite file if not set.
  • CORR_DB_USER / CORR_DB_PASSWORD: Credentials for the clinical source database (if running in extraction mode).

3. Build and Run

Use Docker Compose to build the images and start the services in detached mode:

docker compose up -d --build

4. Verification

Once the containers are running, the services will be available at:

  • 🖥️ Frontend UI: http://localhost:5201
  • ⚙️ Backend API: http://localhost:5200
  • 📄 API Documentation: http://localhost:5200/docs

To view the logs and monitor the startup process:

docker-compose logs -f

Production Deployment

For production environments, use the production-specific compose file which may include stricter restart policies and resource limits:

docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d

To manage the lifecycle of the application in production, you can utilize the provided helper scripts:

  • ./deploy-prod.sh: Pulls changes, rebuilds images, and restarts the stack.
  • ./backup.sh: Creates backups of the metadata database.

Part of this README was generated with the help of AI tools. It may not be fully accurate or up-to-date.

Contributors

Languages