Skip to content

Conversation

@dalloliogm
Copy link
Contributor

This is the PR for the "VAE" bounty.

Summary of changes:

  • some quality of life improvements for COVID19CXRDataset, checking that data exists and expanding the path (e.g. ~/Downloads -> /home/user/Download)
  • Added openpyxl requirements as it is needed to read the COVID-19 metadata file, which is Excel
  • Implemented options for VAE, including image and timeseries
  • Created a new notebook examples/chestXray_image_generation_VAE.ipynb to show example of using image-based VAE. Replaced a previous script that did not work
  • Improved notebook on time series VAE as well examples/timeseries_mimic4.ipynb
  • Converted examples/covid19cxr_conformal.ipynb to a notebook and fixed data loading code in examples/ChestXray-image-generation-GAN.ipynb (these are outside of the bounty, but they were broken and the code was similar enough to the VAE examples, so I fixed it)

- Add VAE model implementation
- Add Chest X-Ray image generation VAE notebook
- Add COVID-19 CXR dataset support
- Add timeseries VAE modeling examples
- Add conformal prediction examples
- Update dataset overview notebook
- Add comprehensive tests for VAE and COVID-19 CXR
- Update project dependencies
@dalloliogm
Copy link
Contributor Author

Should I convert the VAE class to Pytorch Lightining? I understood other classes in PyHealth are based on that, and I like PL syntax better.

Copy link
Collaborator

@jhnwu3 jhnwu3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another nice thing to test out here is if our TaskClass() can support a COVIDCXR19ImageGeneration Class here

Task details here:
https://colab.research.google.com/drive/1kKkkBVS_GclHoYTbnOtjyYnSee79hsyT?usp=sharing

We could probably assume a labelprocessor or multihot schema for an input and its image. This way everything is force-defined.

We can also throw an error if its some modality the VAE doesn't support currently.

Note that there's many ways to do a VAE, with different methods of tokenization, embeddings, etc. (Encoder-Decoder models, etc.)

We can probably stick with our simple assumption here surrounding just image-based variables.

Discrete sequence generation would be a little more complicated (i.e autoregression) https://openreview.net/pdf?id=fYerSwf1Tb

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, we're just assuming absolute paths here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that might be worth adding to the docs.

def __init__(
self,
dataset: BaseSignalDataset,
dataset,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset should be dataset : SampleDataset here.

input_channel: int,
input_size: int,
mode: str,
input_type: str = "image",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to make some assumptions based on the feature keys' types here based on the sample_dataset.output_schema and input_schema instead of using an input_type argument here.

See:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/datasets/sample_dataset.py

Basically, we can check if an input and output is an "image" here or using an ImageProcessor or

sequence/timeseries processors here.


# Embedding model for conditional features only (if used)
if conditional_feature_keys:
self.embedding_model = EmbeddingModel(dataset, embedding_dim=hidden_dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed for images in our case here.

super(VAE, self).__init__(dataset=dataset)
self.input_type = input_type
self.hidden_dim = hidden_dim
self.conditional_feature_keys = conditional_feature_keys
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For conditional features, maybe we can simplify our assumption to the basic multiclass representation/assumption here where we have a specific class we're generating for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants