Bounty: VAE #627

dalloliogm · 2025-11-27T16:55:20Z

This is the PR for the "VAE" bounty.

Summary of changes:

some quality of life improvements for COVID19CXRDataset, checking that data exists and expanding the path (e.g. ~/Downloads -> /home/user/Download)
Added openpyxl requirements as it is needed to read the COVID-19 metadata file, which is Excel
Implemented options for VAE, including image and timeseries
Created a new notebook examples/chestXray_image_generation_VAE.ipynb to show example of using image-based VAE. Replaced a previous script that did not work
Improved notebook on time series VAE as well examples/timeseries_mimic4.ipynb
Converted examples/covid19cxr_conformal.ipynb to a notebook and fixed data loading code in examples/ChestXray-image-generation-GAN.ipynb (these are outside of the bounty, but they were broken and the code was similar enough to the VAE examples, so I fixed it)

- Add VAE model implementation - Add Chest X-Ray image generation VAE notebook - Add COVID-19 CXR dataset support - Add timeseries VAE modeling examples - Add conformal prediction examples - Update dataset overview notebook - Add comprehensive tests for VAE and COVID-19 CXR - Update project dependencies

dalloliogm · 2025-11-29T16:01:40Z

Should I convert the VAE class to Pytorch Lightining? I understood other classes in PyHealth are based on that, and I like PL syntax better.

…ion layers. Improved docs

…nto vae_clean_pr_final

jhnwu3

I think another nice thing to test out here is if our TaskClass() can support a COVIDCXR19ImageGeneration Class here

Task details here:
https://colab.research.google.com/drive/1kKkkBVS_GclHoYTbnOtjyYnSee79hsyT?usp=sharing

We could probably assume a labelprocessor or multihot schema for an input and its image. This way everything is force-defined.

We can also throw an error if its some modality the VAE doesn't support currently.

Note that there's many ways to do a VAE, with different methods of tokenization, embeddings, etc. (Encoder-Decoder models, etc.)

We can probably stick with our simple assumption here surrounding just image-based variables.

Discrete sequence generation would be a little more complicated (i.e autoregression) https://openreview.net/pdf?id=fYerSwf1Tb

jhnwu3 · 2025-11-30T22:08:31Z

pyhealth/datasets/base_dataset.py

Hey, we're just assuming absolute paths here.

I guess that might be worth adding to the docs.

jhnwu3 · 2025-11-30T22:30:32Z

pyhealth/models/vae.py

    def __init__(
        self,
-        dataset: BaseSignalDataset,
+        dataset,


The dataset should be dataset : SampleDataset here.

jhnwu3 · 2025-11-30T22:32:26Z

pyhealth/models/vae.py

-        input_channel: int,
-        input_size: int,
        mode: str,
+        input_type: str = "image",


We might be able to make some assumptions based on the feature keys' types here based on the sample_dataset.output_schema and input_schema instead of using an input_type argument here.

See:
https://github.com/sunlabuiuc/PyHealth/blob/master/pyhealth/datasets/sample_dataset.py

Basically, we can check if an input and output is an "image" here or using an ImageProcessor or

sequence/timeseries processors here.

jhnwu3 · 2025-11-30T22:32:42Z

pyhealth/models/vae.py

+
+            # Embedding model for conditional features only (if used)
+            if conditional_feature_keys:
+                self.embedding_model = EmbeddingModel(dataset, embedding_dim=hidden_dim)


Probably not needed for images in our case here.

jhnwu3 · 2025-11-30T22:33:17Z

pyhealth/models/vae.py

+        super(VAE, self).__init__(dataset=dataset)
+        self.input_type = input_type
        self.hidden_dim = hidden_dim
+        self.conditional_feature_keys = conditional_feature_keys


For conditional features, maybe we can simplify our assumption to the basic multiclass representation/assumption here where we have a specific class we're generating for.

dalloliogm added 2 commits November 27, 2025 16:45

cleanup

33cb3ca

dalloliogm added 5 commits November 29, 2025 16:45

Merge branch 'sunlabuiuc:master' into vae_clean_pr_final

df671f7

Refactor VAE to correctly register conditional and timeseries project…

870a3fd

…ion layers. Improved docs

Merge branch 'vae_clean_pr_final' of github.com:dalloliogm/PyHealth i…

3af54b3

…nto vae_clean_pr_final

improved description of Time Series VAE

f49ed2a

fixing typo - probably pressed key by mistake

752547e

jhnwu3 requested changes Nov 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bounty: VAE #627

Bounty: VAE #627

Uh oh!

dalloliogm commented Nov 27, 2025

Uh oh!

dalloliogm commented Nov 29, 2025

Uh oh!

jhnwu3 left a comment

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

jhnwu3 Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bounty: VAE #627

Are you sure you want to change the base?

Bounty: VAE #627

Uh oh!

Conversation

dalloliogm commented Nov 27, 2025

Uh oh!

dalloliogm commented Nov 29, 2025

Uh oh!

jhnwu3 left a comment

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

jhnwu3 Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants