Machine Learning for Cardiology and Critical Care (ml4c3) is a pipeline for
working with complex physiological data. It does several key things:
- Ingest raw data into a pre-processed format.
- Tensorize ingested data into standard
hd5or CSV formats compatible with the pipeline. - Map desired data from
hd5or CSV files to code via theTensorMapabstraction. - Visualize ECGs, ICU waveforms, labs, and modeling results.
- Explore summary statistics and trends.
- Train supervised and reinforcement learning models that are constructed from
powerful, simple, and expressive command line arguments and
TensorMaps. - Cluster data to reveal patterns via unsupervised learning.
-
Add user to group to run docker
sudo usermod -aG docker $USER -
Build GPU docker image:
./docker/build.sh
and also the CPU-only docker image:
./docker/build.sh -c
-
Setup conda environment for
pre-commit:make setup
On macOS, you can install
gmaketo callgmake setup -
Activate conda environment so that
pre-commithooks run:conda activate ml4c3
When a researcher joins an academic research group, they get access to a workstation or
server and some data. Sometimes a more senior researcher shares some code via email or
Dropbox -- usually a long main.py script with now-outdated dependencies. Nothing is
documented. A manuscript published using the code does not link to an open-source GitHub
repo, nor does it sufficiently document the steps and settings needed to reproduce the
results. The paper merely includes the familar phrase "code is available upon request".
The researcher spends their first week trying to set up their compute environment so they can simply run their predecessor's pipeline. The packages need updating, and a library was compiled in a special way, but nobody knows exactly how. The next two weeks are spent understanding the code. Eventually, the researcher refactors the entire pipeline. A month later, they start training models.
This story is too common in academia. We think there is a better way, so we built ml4c3 to:
- address limitations of Jupyter notebooks, one-off scripts, and glue-code.
- enhance collaborative workflow between researchers, following best practices.
- increase efficiency via excellent documentation and modular code.
Read the documentation at the ml4c3 wiki.
Join the discussions.
Yes, we would love your help! See CONTRIBUTING
to learn how to open an issue, submit a PR, and more.
ml4c3 was built by the Aguirre Lab at the
Center for Systems Biology, Wellman Center for Photomedicine, and Cardiovascular Research
Center at the Massachusetts General Hospital.
We are a team of students, postdoctoral researchers, and engineers who work on overlapping research projects at the intersection of machine learning, computer science, cardiology, and critical care.
Since we share research needs, computational tooling, and data, the benefits of closely collaborating around a shared infrastructure were obvious.