Skip to content

Commit 329f70d

Browse files
authored
ChestX-ray14 Dataset and Classification Tasks (#392)
* Initial version of chestxray14.py * Fix typo and use relative pyhealth import path * Add back full import path * Add "Dataset" to class name. * Initial version of test_chestxray14.py * Add --no-download flag to chestxray14.py * Change "name" to "dataset_name" * Init base class * Better align to BaseDataset and hide internals * Align ChestXray14Dataset to BaseDataset * Add way to set config path * Init BaseDataset last * Set path default to working directory * Fix image path bug * Debug print * DataFrame drop bug fix * Fix DataFrame column name bug * Make classes a list * Fix no findings sum * Fix path prepend * Set config_path for unit tests * Remove test_local_dataset * Fix download path bug * Fix unit test config path * Remove path related testing (can't get working with config_path) * Match unit test path * Add ChestXray14BinaryClassification task * Fix test_len unit test * Add file headers * Add dataset and task to __init__.py and use relative paths for pyhealth import statements * Fix circular import and add back task logger * Add list of classes to task * Add logging handler * Add max and percentages to stat * Override base dataset stats method * Show stats in floating point not scientific notation * Remove duplicate classes list * Multilabel classification task * Update multilabel task to return a list of diseases present * Update multilabel task to use class indices * Fix tensor comparision * Debug print statement * Another debug print statement * Fix label check * Use disease names for labels * Fix multilabel test * Add path to samples * Fix multilabel test and remove debug print * Fix another test bug * Small PR fixes recommended by Copilot * Remove CSV file download from unit tests * Add test for invalid binary classification task disease * Add missing column to unit test input data * Update unit test data setup and teardown * Fix multilabel task docstring (labels description) * Loosen multilabel task unit test to remove need for extra path data in samples * Protect against zip-slip when extracting image TAR files * Remove unused imports from unit tests * Make error message format consistent * Strip out ChestXray14Dataset methods meant only for testing (use get_patient and get_events instead) * Add patient ID heirarchy and additional dataset attributes * Remove unused imports * Shrink "partial" dataset download size * Override the BaseDataset set_task method and replace relative import paths * Add ChestX-ray14 classification task examples * Add ChestX-ray14 API docs * Correct comment in ChestX-ray14 multilabel example * Update datasets.rst and tasks.rst
1 parent 5f5c991 commit 329f70d

14 files changed

+3351
-4
lines changed

docs/api/datasets.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,4 +47,4 @@ Available Datasets
4747
datasets/pyhealth.datasets.TUEVDataset
4848
datasets/pyhealth.datasets.splitter
4949
datasets/pyhealth.datasets.utils
50-
50+
datasets/pyhealth.datasets.ChestXray14Dataset
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
pyhealth.datasets.ChestXray14Dataset
2+
===================================
3+
4+
The NIH ChestX-ray14 dataset. For more information see `here <https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345>`_. Note that the copy of this dataset on `Kaggle <https://www.kaggle.com/datasets/nih-chest-xrays/data>`_ is stale, as corrections have been made to the metadata (see `here <https://nihcc.app.box.com/v/ChestXray-NIHCC/file/249505703122>`_).
5+
6+
.. autoclass:: pyhealth.datasets.ChestXray14Dataset
7+
:members:
8+
:undoc-members:
9+
:show-inheritance:

docs/api/tasks.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Tasks
22
===============
33

4-
We support various real-world healthcare predictive tasks defined by **function calls**. The following example tasks are collected from top AI/Medical venues, such as:
4+
We support various real-world healthcare predictive tasks defined by **function calls**. The following example tasks are collected from top AI/Medical venues, such as:
55

66
(i) Drug Recommendation [Yang et al. IJCAI 2021a, Yang et al. IJCAI 2021b, Shang et al. AAAI 2020]
77

@@ -72,7 +72,7 @@ Available Tasks
7272

7373
.. toctree::
7474
:maxdepth: 3
75-
75+
7676
Base Task <tasks/pyhealth.tasks.BaseTask>
7777
Readmission (30 Days, MIMIC-IV) <tasks/pyhealth.tasks.Readmission30DaysMIMIC4>
7878
In-Hospital Mortality (MIMIC-IV) <tasks/pyhealth.tasks.InHospitalMortalityMIMIC4>
@@ -93,4 +93,5 @@ Available Tasks
9393
Temple University EEG Tasks <tasks/pyhealth.tasks.temple_university_EEG_tasks>
9494
Sleep Staging v2 <tasks/pyhealth.tasks.sleep_staging_v2>
9595
Benchmark EHRShot <tasks/pyhealth.tasks.benchmark_ehrshot>
96-
96+
ChestX-ray14 Binary Classification <tasks/pyhealth.tasks.ChestXray14BinaryClassification>
97+
ChestX-ray14 Multilabel Classification <tasks/pyhealth.tasks.ChestXray14MultilabelClassification>
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pyhealth.tasks.ChestXray14BinaryClassification
2+
=======================================
3+
4+
.. autoclass:: pyhealth.tasks.ChestXray14BinaryClassification
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pyhealth.tasks.ChestXray14MultilabelClassification
2+
=======================================
3+
4+
.. autoclass:: pyhealth.tasks.ChestXray14MultilabelClassification
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:

0 commit comments

Comments
 (0)