Skip to content

Optimal Transport and Loglikihood Losses for Expression#26

Open
bio-info-guy wants to merge 4 commits intodavidliwei:mainfrom
bio-info-guy:dev
Open

Optimal Transport and Loglikihood Losses for Expression#26
bio-info-guy wants to merge 4 commits intodavidliwei:mainfrom
bio-info-guy:dev

Conversation

@bio-info-guy
Copy link
Contributor

Major change:

Added log liklihood losses and sampling for expression prediction

  • required changes to MVCDecoder and model.encode_batch_with_perturb
  • config.distribution allows following options:
    • None
    • negative binomial (counts): nb
    • zero-inflated negative binomial (counts): zinb
    • hurdle truncated negative binomial (counts): hnb
    • poisson (counts): pois
    • zero infalted poisson (counts): zipois
    • zero inflated gaussian (lognormalized expression): zig
  • counts are obtained by calculating sizefactor from lognormalized expression in the dataloader via _get_sf function
  • for the discrete distributions, a sizefactor or sf was needed to help the MVCDecoder learn sizefactor invariant expression mean, which were then multiplied by the corresponding target cell's size factor to get the actual distributional means). During inference, one can use the input cell's sizefactor to scale since the target cell is unknown.
  • eval_testdata adds new option to sample=True or False from distribution during inference and also whether to use input data's sizefactors to scale final results (reasoning shown above) sizefactor=True or False.
  • to use log likelihood losses, the input MUST be lognormalized expression and not bins or raw counts. (NOTE: find way to enforce this in future commits)

optimal transport:

  • optimal transport pairing implemented via jax (require installing jax package) when creating a dataloader
  • under the current setup, for each perturbation, pairing only occurs between non-perturbed cells and perturbed cells of the same cell type
  • optimal transport default is false via config.use_ot, parameters can be provided via config.ot_params as a dictionary (default is in set in the dataloader and roughly matches one target cell (with ~0.99 probability) for each non-perturbed cell)
  • optimal transport REQUIRES running PCA on anndata first (recommend at least 100 components)

- added nb, zinb, hnb, zig (zero inflate gaussian), pois and zpois distributions
- calculate and use sizefactor based on distribution of model
- custom sampling of all distributions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant