This project uses machine learning to predict the diagnoses of patients with congestive heart failure or patients with myocardial infarction. While it focuses primarily on these two conditions, the code can be abstracted to any number of diseases or conditions within Synthea.
Synthea is a tool for generating patient json files in FHIR format. The medical history of each patient is randomly generated using the modules supplied by Synthea. These modules can be viewed and edited with the Synthea Module Builder; however, we left these modules unchanged when generating our dataset.
synthea_data_pipeline.ipynb is a jupyter notebook that takes a folder of Synthea generated json files as input. All relevant medical data is extracted for each patient and transformed into a machine learning model readable format called embeddings. These embeddings are exported as csv and npy files as output, ready to be sent to Model Testing & Training.
-
python3 training.py <folder with csvs> <label>
-
python3 w2vtrain.py <folder containing npy files>
The output csv and npy folders are used as input for training.py and w2vtrain.py. Both scripts use various machine learning models to create a prediction accuracy.