Defrag: Inferring Treatment Pathways from Patient Records

This repository contains the source code for the paper "Inferring Treatment Pathways from Patient Records".

Abstract:

Treatment pathways are step-by-step plans outlining the recommended medical care for specific diseases; they are revised when different treatments are found to improve patient outcomes. Examining health records is an important part of this revision process, but inferring patients’ actual treatments from health data is challenging due to the complex event coding schemes and absence of pathway-related annotations. We introduce Defrag, a method for examining health records to infer the real-world treatment steps for a particular patient group. Defrag learns the semantic and temporal meaning of healthcare event sequences, allowing it to reliably infer treatment steps from complex healthcare data. To our knowledge, Defrag is the first pathway-inference method to utilise a neural network (NN), an approach made possible by a novel self-supervised learning objective. We also introduce a testing and validation framework for pathway inference to characterise and evaluate Defrag's pathway inference ability, and to establish benchmarks. We demonstrate Defrag's effectiveness by identifying best-practice pathway fragments for three cancer types in public healthcare records, and by inferring pathways in synthetic experiments, where it significantly outperforms non-NN-based methods. We also provide open-source code for Defrag and the testing framework are provided to encourage further research in this area.

Resources

What is in this repository:

The source code for Defrag:
- The Transformer and associated training code are located in case/.
- The pathway inference code is located in defrag.py.
The source code for the testing and validation framework, which is located in catsyn/
Programatic experiment configurations, located in experiment_configs.py
Source code to run experiments, located in conductor.py

What is not in this repository:

The MIMIC-IV data, since the license prohibits redistribution.
Most experiment results, including trained models, loss data, and inferred pathways, since they can be reproduced (as described below).

Setup

There are two main ways to run the code:

By building a Docker image from the provided Dockerfile, or
By creating a conda environment with the provided env.yml file.

All package versions should be pinned to the bugfix version, so, provided you run the code via one of these two steps, you shouldn't have any problems with package versioning.

That said, the code was only tested on Ubuntu 22.04.2 LTS. YMMV.

Data

All code is provided for running Defrag on synthetic data. However, if you intend to run the code on MIMIC-IV, you need to:

Acquire access to and download the MIMIC-IV v1.0 dataset.
Download .csv tables ICD-10-CM to ICD-9-CM and ICD-10-PCS to ICD-9-PCS. These tables are for mapping between ICD 10 and 9 code versions from the Centers for Medicare & Medicaid Services.
Download the Multi-Level CCS zip archive and extract ccs_multi_dx_tool_2015.csv and ccs_multi_pr_tool_2015.csv tables.
Run the build_mimic_feature_set.py Python script to generate the mimic_feature_set.parquet file. Note: You will need to modify the script to specify the appropriate locations of the above resources.

Once you've completed the above steps, you will be able to run experiments on the MIMIC-IV dataset.

Running Experiments

To run an experiment, its configuration must first be defined in experiment_configs.py

Once defined, the experiment, which is referenced via the string used in the function's name, is run as follows:

python3 conductor.py <experiment_name>

Conductor orchestrates the running of a single Defrag experiment on a dataset, including filtering the generating synthetic data or filtering MIMIC, training the Transformer, clustering encodings, and inferring the pathway. Experiments should be deterministic w.r.t their configuration and, for the most part, idempotent.

Reproducing results

The MIMIC-IV results presented in the paper are specified with the following configurations:

mimic_experiment_breast_soft_hierarchical_5
mimic_experiment_lung_soft_hierarchical_5
mimic_experiment_melanoma_soft_hierarchical_5

For the synthetic data experiments, you need to first conduct the following experiments:

synthetic_data_experiment_big
synthetic_data_experiment_big_1000_bins

This runs Defrag on the experiments. After running these experiments, they need to be scored by running postprocess_experiments.py, followed by running the baselines, which can be done with the scripy run_lda.py.

Finally, the figures from the paper can be reproduced with the code cells in the paper_material.ipynb notebook.

For questions, please raise an issue or contact the authors.

If you use this work, please cite it using the following citation:

Citation to be added after the manuscript has been published.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.devcontainer		.devcontainer
case		case
catsyn		catsyn
paper_figures		paper_figures
py_wlgk		py_wlgk
statistical_experiments		statistical_experiments
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
_config_feedstock.py		_config_feedstock.py
_constants.py		_constants.py
build_mimic_feature_set.py		build_mimic_feature_set.py
case_config_generator.py		case_config_generator.py
catsyn_config_generator.py		catsyn_config_generator.py
code_relevancy.py		code_relevancy.py
conductor.py		conductor.py
critical_difference_diagram.py		critical_difference_diagram.py
defrag.py		defrag.py
env.yml		env.yml
experiment_configs.py		experiment_configs.py
filter_mimic.py		filter_mimic.py
gallery.py		gallery.py
gallery_template.ipynb		gallery_template.ipynb
image_processor.py		image_processor.py
images.py		images.py
mimic_config_generator.py		mimic_config_generator.py
paper_material.ipynb		paper_material.ipynb
plot.py		plot.py
postprocess_experiments.py		postprocess_experiments.py
run_lda.py		run_lda.py
thesis_material.ipynb		thesis_material.ipynb
tqdm_joblib.py		tqdm_joblib.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Defrag: Inferring Treatment Pathways from Patient Records

Resources

Setup

Data

Running Experiments

Reproducing results

About

Uh oh!

Releases

Packages

Languages

adriancaruana/defrag

Folders and files

Latest commit

History

Repository files navigation

Defrag: Inferring Treatment Pathways from Patient Records

Resources

Setup

Data

Running Experiments

Reproducing results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages