Skip to content

anthem-ai/ucsc-engineering-2020

Repository files navigation

UCSC Engineering 2020

This project uses machine learning to predict the diagnoses of patients with congestive heart failure or patients with myocardial infarction. While it focuses primarily on these two conditions, the code can be abstracted to any number of diseases or conditions within Synthea.

Data Generation

Synthea is a tool for generating patient json files in FHIR format. The medical history of each patient is randomly generated using the modules supplied by Synthea. These modules can be viewed and edited with the Synthea Module Builder; however, we left these modules unchanged when generating our dataset.

Data Pre-processing & Transformation

Files

Brief Summary

synthea_data_pipeline.ipynb is a jupyter notebook that takes a folder of Synthea generated json files as input. All relevant medical data is extracted for each patient and transformed into a machine learning model readable format called embeddings. These embeddings are exported as csv and npy files as output, ready to be sent to Model Testing & Training.

Overview

Model Testing & Training

Files

  • training.py

    • python3 training.py <folder with csvs> <label>
  • w2vtrain.py

    • python3 w2vtrain.py <folder containing npy files>

Brief Summary

The output csv and npy folders are used as input for training.py and w2vtrain.py. Both scripts use various machine learning models to create a prediction accuracy.

Overview

About

Spring 2020 Anthem Sponsored UCSC Engineering Student Project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7