Democratizing Access to Clinical Data for Retrospective Research
The CORR Cohort Builder is a designed to bridge the gap between high-dimensional intensive care unit data and clinical researchers.
- Define inclusion and exclusion criteria using a visual logic builder.
- Filter patients based on demographics, clinical events, or temporal logic (e.g., "Sepsis within 24h of admission").
- Seamlessly integrates pre-validated clinical concepts from the
CORR-Varslibrary. - Supports both Native (raw DB extraction) and Derived (calculated via Python/Polars) variables.
- Allows customization of time windows and aggregation methods (e.g., min, max, mean) directly in the UI.
- Real-time feedback on cohort size and attrition.
- Automated generation of a publication-ready "Table One" (baseline characteristics).
- Integrated data profiling reports (distributions, missingness, correlations).
- Offline Mode: Allows safe feasibility checks on an intranet server without direct database access.
- Online Mode: Enables authorized extraction of full datasets on secure research servers.
- Project-based access control (Owner, Editor, Read-Only).
The application is built on a scalable microservices architecture to separate user interaction from heavy data processing.
| Component | Technology | Description |
|---|---|---|
| Frontend | Streamlit | Reactive web interface for visual configuration. Handles state management and input validation. |
| Backend | FastAPI | Central REST API managing metadata persistence, authentication (JWT), and orchestration. |
| Executor | Python Worker | Async engine that runs heavy data queries. It translates visual definitions into executable CORR-Vars code. |
| Queue | Redis | Manages async job distribution between the Backend and Executor to prevent blocking the UI. |
| Database | SQLite | Stores project metadata, cohort definitions, and user settings. |
The codebase is organized by service, facilitating independent development and deployment.
streamlit/: Frontend Application. Contains the UI layout, plotting logic, and session state management.fastapi/: Backend API. Contains API routes, Pydantic models, and database repositories.cohort_executor/: Execution Engine. Contains the logic for translating configurations toCORR-Varsobjects and executing queries.shared/: Shared Library. Common schemas, enums, and utilities used across all services.docker-compose.yml: Deployment Config. Defines the multi-container setup for Docker.
git clone https://github.com/CUB-CORR/cohort-builder.git
cd cohort-builderCreate a .env file in the root directory. You can start by copying the example configuration:
cp .env.example .envCritical Environment Variables:
SECRET_KEY: Set a secure random string for JWT token generation.DB_CONNECTION_STRING: (Optional) Connection string for the metadata database. Defaults to a local SQLite file if not set.CORR_DB_USER/CORR_DB_PASSWORD: Credentials for the clinical source database (if running in extraction mode).
Use Docker Compose to build the images and start the services in detached mode:
docker compose up -d --buildOnce the containers are running, the services will be available at:
- 🖥️ Frontend UI:
http://localhost:5201 - ⚙️ Backend API:
http://localhost:5200 - 📄 API Documentation:
http://localhost:5200/docs
To view the logs and monitor the startup process:
docker-compose logs -fFor production environments, use the production-specific compose file which may include stricter restart policies and resource limits:
docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -dTo manage the lifecycle of the application in production, you can utilize the provided helper scripts:
./deploy-prod.sh: Pulls changes, rebuilds images, and restarts the stack../backup.sh: Creates backups of the metadata database.
Part of this README was generated with the help of AI tools. It may not be fully accurate or up-to-date.