Skip to content

Commit 068698d

Browse files
committed
Merge branch 'master' of https://github.com/sunlabuiuc/PyHealth into pyhealth2bounty/adacare
2 parents 710ebae + 329f70d commit 068698d

39 files changed

+6804
-557
lines changed

docs/api/calib.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,23 @@ confidence levels:
2323
- :class:`~pyhealth.calib.predictionset.FavMac`: Value-maximizing sets with cost control
2424
- :class:`~pyhealth.calib.predictionset.CovariateLabel`: Covariate shift adaptive conformal
2525

26+
Getting Started
27+
---------------
28+
29+
New to calibration and uncertainty quantification? Check out this complete example:
30+
31+
**Browse all examples online**: https://github.com/sunlabuiuc/PyHealth/tree/master/examples
32+
33+
- **Example**: ``examples/covid19cxr_conformal.py`` - Comprehensive conformal prediction workflow demonstrating:
34+
35+
- Training a ResNet-18 model on COVID-19 chest X-ray classification
36+
- Applying conventional conformal prediction with **LABEL**
37+
- Using covariate shift adaptive conformal prediction with **CovariateLabel**
38+
- Comparing coverage guarantees and efficiency between methods
39+
- Understanding when to use each method based on distribution shift
40+
41+
This example shows the complete pipeline from model training to uncertainty-aware predictions with formal coverage guarantees.
42+
2643
Quick Links
2744
-----------
2845

docs/api/data.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,23 @@ Data
33

44
**pyhealth.data** defines the atomic data structures of this package.
55

6+
Getting Started
7+
---------------
8+
9+
New to PyHealth's data structures? Start here:
10+
11+
- **Tutorial**: `Introduction to pyhealth.data <https://colab.research.google.com/drive/1y9PawgSbyMbSSMw1dpfwtooH7qzOEYdN?usp=sharing>`_ | `Video <https://www.youtube.com/watch?v=Nk1itBoLOX8&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=2>`_
12+
13+
This tutorial introduces the core data structures in PyHealth:
14+
15+
- **Event**: Represents individual clinical events (diagnoses, procedures, medications, lab results, etc.)
16+
- **Patient**: Contains all events and visits for a single patient, forming the foundation of healthcare data organization
17+
18+
Understanding these structures is essential for working with PyHealth, as they provide the standardized format for representing electronic health records.
19+
20+
API Reference
21+
-------------
22+
623
.. toctree::
724
:maxdepth: 3
825

docs/api/datasets.rst

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,30 @@
11
Datasets
22
===============
33

4+
Getting Started
5+
---------------
6+
7+
New to PyHealth datasets? Start here:
8+
9+
- **Tutorial**: `Introduction to pyhealth.datasets <https://colab.research.google.com/drive/1voSx7wEfzXfEf2sIfW6b-8p1KqMyuWxK?usp=sharing>`_ | `Video (PyHealth 1.6) <https://www.youtube.com/watch?v=c1InKqFJbsI&list=PLR3CNIF8DDHJUl8RLhyOVpX_kT4bxulEV&index=3>`_
10+
11+
This tutorial covers:
12+
13+
- How to load and work with different healthcare datasets (MIMIC-III, MIMIC-IV, eICU, etc.)
14+
- Understanding the ``BaseDataset`` structure and patient representation
15+
- Parsing raw EHR data into standardized PyHealth format
16+
- Accessing patient records, visits, and clinical events
17+
- Dataset splitting for train/validation/test sets
18+
19+
**Data Access**: If you're new and need help accessing MIMIC datasets, check the :doc:`../how_to_contribute` guide's "Data Access for Testing" section for information on:
20+
21+
- Getting MIMIC credentialing through PhysioNet
22+
- Using openly available demo datasets (MIMIC-III Demo, MIMIC-IV Demo)
23+
- Working with synthetic data for testing
24+
25+
Available Datasets
26+
------------------
27+
428
.. toctree::
529
:maxdepth: 3
630

@@ -23,4 +47,4 @@ Datasets
2347
datasets/pyhealth.datasets.TUEVDataset
2448
datasets/pyhealth.datasets.splitter
2549
datasets/pyhealth.datasets.utils
26-
50+
datasets/pyhealth.datasets.ChestXray14Dataset
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
pyhealth.datasets.ChestXray14Dataset
2+
===================================
3+
4+
The NIH ChestX-ray14 dataset. For more information see `here <https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345>`_. Note that the copy of this dataset on `Kaggle <https://www.kaggle.com/datasets/nih-chest-xrays/data>`_ is stale, as corrections have been made to the metadata (see `here <https://nihcc.app.box.com/v/ChestXray-NIHCC/file/249505703122>`_).
5+
6+
.. autoclass:: pyhealth.datasets.ChestXray14Dataset
7+
:members:
8+
:undoc-members:
9+
:show-inheritance:

docs/api/interpret.rst

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,46 @@
11
Interpretability
22
===============
33

4-
We implement the following interpretability techniques below:
4+
We implement the following interpretability techniques to help you understand model predictions and identify important features in healthcare data.
55

6-
Help always needed in this direction.
6+
7+
Getting Started
8+
---------------
9+
10+
New to interpretability in PyHealth? Check out these complete examples:
11+
12+
**Browse all examples online**: https://github.com/sunlabuiuc/PyHealth/tree/master/examples
13+
14+
**DeepLift Example:**
15+
16+
- ``examples/deeplift_stagenet_mimic4.py`` - Demonstrates DeepLift attributions on StageNet for mortality prediction with MIMIC-IV data. Shows how to:
17+
18+
- Compute feature attributions for discrete (ICD codes) and continuous (lab values) features
19+
- Decode attributions back to human-readable medical codes and descriptions
20+
- Visualize top positive and negative attributions
21+
22+
**Integrated Gradients Examples:**
23+
24+
- ``examples/integrated_gradients_mortality_mimic4_stagenet.py`` - Complete workflow showing:
25+
26+
- How to load pre-trained models and compute attributions
27+
- Comparing attributions for different target classes (mortality vs. survival)
28+
- Interpreting results with medical context (lab categories, diagnosis codes)
29+
30+
- ``examples/interpretability_metrics.py`` - Demonstrates evaluation of attribution methods using:
31+
32+
- **Comprehensiveness**: Measures how much prediction drops when removing important features
33+
- **Sufficiency**: Measures how much prediction is retained when keeping only important features
34+
- Both functional API (``evaluate_attribution``) and class-based API (``Evaluator``)
35+
36+
These examples provide end-to-end workflows from loading data to interpreting and evaluating attributions.
37+
38+
Available Methods
39+
-----------------
740

841
.. toctree::
942
:maxdepth: 3
1043

1144
interpret/pyhealth.interpret.methods.chefer
45+
interpret/pyhealth.interpret.methods.deeplift
1246
interpret/pyhealth.interpret.methods.integrated_gradients

docs/api/tasks.rst

Lines changed: 62 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Tasks
22
===============
33

4-
We support various real-world healthcare predictive tasks defined by **function calls**. The following example tasks are collected from top AI/Medical venues, such as:
4+
We support various real-world healthcare predictive tasks defined by **function calls**. The following example tasks are collected from top AI/Medical venues, such as:
55

66
(i) Drug Recommendation [Yang et al. IJCAI 2021a, Yang et al. IJCAI 2021b, Shang et al. AAAI 2020]
77

@@ -11,11 +11,68 @@ We support various real-world healthcare predictive tasks defined by **function
1111

1212
(iv) Length of Stay Prediction
1313

14-
(v) Sleep Staging [Yang et al. ArXiv 2021]
14+
(v) Sleep Staging [Yang et al. ArXix 2021]
15+
16+
Getting Started
17+
---------------
18+
19+
New to PyHealth tasks? Start here:
20+
21+
- **Tutorial**: `Introduction to pyhealth.tasks <https://colab.research.google.com/drive/1kKkkBVS_GclHoYTbnOtjyYnSee79hsyT?usp=sharing>`_ - Learn the basics of defining and using tasks
22+
- **Code Examples**: Browse all examples online at https://github.com/sunlabuiuc/PyHealth/tree/master/examples
23+
- **Pipeline Examples**: Check out our :doc:`../tutorials` page for complete end-to-end examples including:
24+
25+
- Mortality Prediction Pipeline
26+
- Readmission Prediction Pipeline
27+
- Medical Coding Pipeline
28+
- Chest X-ray Classification Pipeline
29+
30+
These tutorials demonstrate how to load datasets, apply tasks, train models, and evaluate results.
31+
32+
Understanding Tasks and Processors
33+
-----------------------------------
34+
35+
Tasks define **what** data to extract (via ``input_schema`` and ``output_schema``), while **processors** define **how** to transform that data into tensors for model training.
36+
37+
After you define a task:
38+
39+
1. **Task execution**: The task function extracts relevant features from patient records and generates samples
40+
2. **Processor application**: Processors automatically transform these samples into model-ready tensors based on the schemas
41+
42+
**Example workflow:**
43+
44+
.. code-block:: python
45+
46+
# 1. Define a task with input/output schemas
47+
task = MortalityPredictionMIMIC4()
48+
# input_schema = {"conditions": "sequence", "procedures": "sequence"}
49+
# output_schema = {"mortality": "binary"}
50+
51+
# 2. Apply task to dataset
52+
sample_dataset = base_dataset.set_task(task)
53+
54+
# 3. Processors automatically transform samples:
55+
# - "sequence" -> SequenceProcessor (converts codes to indices)
56+
# - "binary" -> BinaryLabelProcessor (converts labels to tensors)
57+
58+
# 4. Get model-ready tensors
59+
sample = sample_dataset[0]
60+
# sample["conditions"] is now a tensor of token indices
61+
# sample["mortality"] is now a binary tensor [0] or [1]
62+
63+
**Learn more about processors:**
64+
65+
- See the :doc:`processors` documentation for details on all available processors
66+
- Learn about string keys (``"sequence"``, ``"binary"``, etc.) that map to specific processors
67+
- Discover how to customize processor behavior with kwargs tuples
68+
- Understand processor types for different data modalities (text, images, signals, etc.)
69+
70+
Available Tasks
71+
---------------
1572

1673
.. toctree::
1774
:maxdepth: 3
18-
75+
1976
Base Task <tasks/pyhealth.tasks.BaseTask>
2077
Readmission (30 Days, MIMIC-IV) <tasks/pyhealth.tasks.Readmission30DaysMIMIC4>
2178
In-Hospital Mortality (MIMIC-IV) <tasks/pyhealth.tasks.InHospitalMortalityMIMIC4>
@@ -36,4 +93,5 @@ We support various real-world healthcare predictive tasks defined by **function
3693
Temple University EEG Tasks <tasks/pyhealth.tasks.temple_university_EEG_tasks>
3794
Sleep Staging v2 <tasks/pyhealth.tasks.sleep_staging_v2>
3895
Benchmark EHRShot <tasks/pyhealth.tasks.benchmark_ehrshot>
39-
96+
ChestX-ray14 Binary Classification <tasks/pyhealth.tasks.ChestXray14BinaryClassification>
97+
ChestX-ray14 Multilabel Classification <tasks/pyhealth.tasks.ChestXray14MultilabelClassification>
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pyhealth.tasks.ChestXray14BinaryClassification
2+
=======================================
3+
4+
.. autoclass:: pyhealth.tasks.ChestXray14BinaryClassification
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pyhealth.tasks.ChestXray14MultilabelClassification
2+
=======================================
3+
4+
.. autoclass:: pyhealth.tasks.ChestXray14MultilabelClassification
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

0 commit comments

Comments
 (0)