Docker image for configuring, building, and running Synthea to generate test clinical data
SyntheaTM is a Synthetic Patient Population Simulator.
The goal is to output synthetic, realistic (but not real),
patient data and associated health records in a variety of formats.
Synthea Source code: https://github.com/synthetichealth/synthea
Synthea Wiki: https://github.com/synthetichealth/synthea/wiki
Also includes SQL Definitions for several dialects: ./synthea-database/README.md
(For windows, run the .ps1
file with the same name.)
- optionally configure environment variables (optional because defaults are set in
env-setup.sh
)
# copy .env.sample to .env, then make your changes to .env
cp .env.sample .env
# .env.sample
REPO_URL=https://github.com/synthetichealth/synthea.git
SYNTHEA_BRANCH=v3.2.0
IMAGE_NAME=synthea-docker-v3.2.0
MINAGE=18
MAXAGE=64
-
configure
include/synthea.config
-
add custom modules to
include/modules
and custom resources toinclude/resources
-
build image:
# on linux, you may need to run with sudo if your user is not in the 'docker' group
# on macos (docker desktop), sudo is typically not required
sh ./build.sh
- run the image to generate patient data (writes to
output/
folder):
sh ./run.sh
- for debugging, drop into a shell of the image
sh ./shell.sh
- optionally remove containers and image
sh ./rm-image.sh
-
Dockerfile
- Compiles synthea java source code into an image namedsynthea-build
- copies
include/
files to their appropriate location - include a runtime environment with some shell scripts
- copies
-
Environment variable management:
.env.sample
: sample.env
file to use as a basis for your own.env
env-setup.sh
: set environment variables from .env, for use in shell scripts- defines fallback values if needed
- automatically sourced at the top of each shell script
-
Shell scripts for building and running (each has a
.ps1
analog):build.sh
: create a docker image with everything needed to generate clinical data- includes resources and configuration files in the
include
directory
- includes resources and configuration files in the
run.sh
: run the image, with a volume attached to a local/output
folderdocker run -it -v ${PWD}\\output:/synthea/output $IMAGE_NAME
- runs
./generate_data
- shell script to generate data
java -jar synthea-with-dependencies.jar -c synthea.config -a $MINAGE-$MAXAGE
shell.sh
: for debugging, enter a shell in a running containerrm-image.sh
: for cleanup, remove existing containers and image
Configuration files and folders, automatically copied into the build
include/modules/
- custom modulesinclude/modules/testmodule.json
- example of a custom module
include/resources/
- custom resourcesinclude/resources/names.yml
- language and gender-specific lists of names to use as patient given names
include/output/csv
- example (empty) output csv files with headersinclude/generate_data
- main entry point when you run the imageinclude/synthea.config
- configuration items - see synthea's wiki for details: Common Configuration- NOTE: age is not a setting that can be set in
synthea.config
- instead, pass
-a min-max
as a parameter to the java executable
- instead, pass
- NOTE: age is not a setting that can be set in
include/synthea.properties
- all synthea properties
DDL for SQL tables that match the structure of synthea output csvs: ./synthea-database/README.md
- er-diagrams
- mssql
- postgresql
- scripts
The main entry point is include/generate_data
. this gets copied to /synthea/synthea
, which also contains
run_synthea
, which is provided by synthea and runs the following:
java -jar synthea-with-dependencies.jar [-h]
[-s seed]
[-r referenceDate as YYYYMMDD]
[-cs clinician seed]
[-p populationSize]
[-g gender]
[-a minAge-maxAge]
[-c localConfigFilePath]
[-d localModulesDirPath]
[state [city]]