-
Notifications
You must be signed in to change notification settings - Fork 475
Bounty: VAE #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Bounty: VAE #627
Conversation
- Add VAE model implementation - Add Chest X-Ray image generation VAE notebook - Add COVID-19 CXR dataset support - Add timeseries VAE modeling examples - Add conformal prediction examples - Update dataset overview notebook - Add comprehensive tests for VAE and COVID-19 CXR - Update project dependencies
|
Should I convert the VAE class to Pytorch Lightining? I understood other classes in PyHealth are based on that, and I like PL syntax better. |
…ion layers. Improved docs
…nto vae_clean_pr_final
jhnwu3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another nice thing to test out here is if our TaskClass() can support a COVIDCXR19ImageGeneration Class here
Task details here:
https://colab.research.google.com/drive/1kKkkBVS_GclHoYTbnOtjyYnSee79hsyT?usp=sharing
We could probably assume a labelprocessor or multihot schema for an input and its image. This way everything is force-defined.
We can also throw an error if its some modality the VAE doesn't support currently.
Note that there's many ways to do a VAE, with different methods of tokenization, embeddings, etc. (Encoder-Decoder models, etc.)
We can probably stick with our simple assumption here surrounding just image-based variables.
Discrete sequence generation would be a little more complicated (i.e autoregression) https://openreview.net/pdf?id=fYerSwf1Tb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, we're just assuming absolute paths here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that might be worth adding to the docs.
| def __init__( | ||
| self, | ||
| dataset: BaseSignalDataset, | ||
| dataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataset should be dataset : SampleDataset here.
| input_channel: int, | ||
| input_size: int, | ||
| mode: str, | ||
| input_type: str = "image", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to make some assumptions based on the feature keys' types here based on the sample_dataset.output_schema and input_schema instead of using an input_type argument here.
See:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/datasets/sample_dataset.py
Basically, we can check if an input and output is an "image" here or using an ImageProcessor or
sequence/timeseries processors here.
|
|
||
| # Embedding model for conditional features only (if used) | ||
| if conditional_feature_keys: | ||
| self.embedding_model = EmbeddingModel(dataset, embedding_dim=hidden_dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed for images in our case here.
| super(VAE, self).__init__(dataset=dataset) | ||
| self.input_type = input_type | ||
| self.hidden_dim = hidden_dim | ||
| self.conditional_feature_keys = conditional_feature_keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For conditional features, maybe we can simplify our assumption to the basic multiclass representation/assumption here where we have a specific class we're generating for.
This is the PR for the "VAE" bounty.
Summary of changes:
examples/chestXray_image_generation_VAE.ipynbto show example of using image-based VAE. Replaced a previous script that did not workexamples/timeseries_mimic4.ipynb