Skip to content

docker image for generating synthetic clinical data using synthea

Notifications You must be signed in to change notification settings

mrreband/synthea-docker

Repository files navigation

Synthea Patient Generator

Docker image for configuring, building, and running Synthea to generate test clinical data

SyntheaTM is a Synthetic Patient Population Simulator.
The goal is to output synthetic, realistic (but not real),
patient data and associated health records in a variety of formats.

Synthea Source code: https://github.com/synthetichealth/synthea

Synthea Wiki: https://github.com/synthetichealth/synthea/wiki


Database

Also includes SQL Definitions for several dialects: ./synthea-database/README.md


Usage

(For windows, run the .ps1 file with the same name.)

  1. optionally configure environment variables (optional because defaults are set in env-setup.sh)
# copy .env.sample to .env, then make your changes to .env
cp .env.sample .env
# .env.sample
REPO_URL=https://github.com/synthetichealth/synthea.git
SYNTHEA_BRANCH=v3.2.0
IMAGE_NAME=synthea-docker-v3.2.0
MINAGE=18
MAXAGE=64
  1. configure include/synthea.config

  2. add custom modules to include/modules and custom resources to include/resources

  3. build image:

# on linux, you may need to run with sudo if your user is not in the 'docker' group
# on macos (docker desktop), sudo is typically not required
sh ./build.sh
  1. run the image to generate patient data (writes to output/ folder):
sh ./run.sh
  1. for debugging, drop into a shell of the image
sh ./shell.sh
  1. optionally remove containers and image
sh ./rm-image.sh

File Structure

root

  • Dockerfile - Compiles synthea java source code into an image named synthea-build

    • copies include/ files to their appropriate location
    • include a runtime environment with some shell scripts
  • Environment variable management:

    • .env.sample: sample .env file to use as a basis for your own .env
    • env-setup.sh: set environment variables from .env, for use in shell scripts
      • defines fallback values if needed
      • automatically sourced at the top of each shell script
  • Shell scripts for building and running (each has a .ps1 analog):

    • build.sh: create a docker image with everything needed to generate clinical data
      • includes resources and configuration files in the include directory
    • run.sh: run the image, with a volume attached to a local /output folder
      • docker run -it -v ${PWD}\\output:/synthea/output $IMAGE_NAME
      • runs ./generate_data
        • shell script to generate data
        • java -jar synthea-with-dependencies.jar -c synthea.config -a $MINAGE-$MAXAGE
    • shell.sh: for debugging, enter a shell in a running container
    • rm-image.sh: for cleanup, remove existing containers and image

include

Configuration files and folders, automatically copied into the build

  • include/modules/ - custom modules
    • include/modules/testmodule.json - example of a custom module
  • include/resources/ - custom resources
    • include/resources/names.yml - language and gender-specific lists of names to use as patient given names
  • include/output/csv - example (empty) output csv files with headers
  • include/generate_data - main entry point when you run the image
  • include/synthea.config - configuration items - see synthea's wiki for details: Common Configuration
    • NOTE: age is not a setting that can be set in synthea.config
      • instead, pass -a min-max as a parameter to the java executable
  • include/synthea.properties - all synthea properties

synthea-database

DDL for SQL tables that match the structure of synthea output csvs: ./synthea-database/README.md

  • er-diagrams
  • mssql
  • postgresql
  • scripts

detail

The main entry point is include/generate_data. this gets copied to /synthea/synthea, which also contains run_synthea, which is provided by synthea and runs the following:

java -jar synthea-with-dependencies.jar [-h]
                                        [-s seed]
                                        [-r referenceDate as YYYYMMDD]
                                        [-cs clinician seed]
                                        [-p populationSize]
                                        [-g gender]
                                        [-a minAge-maxAge]
                                        [-c localConfigFilePath]
                                        [-d localModulesDirPath]
                                        [state [city]]

About

docker image for generating synthetic clinical data using synthea

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published