Optimal Transport and Loglikihood Losses for Expression by bio-info-guy · Pull Request #26 · davidliwei/pertTF

bio-info-guy · 2026-02-13T21:45:09Z

Major change:

Added log liklihood losses and sampling for expression prediction

required changes to MVCDecoder and model.encode_batch_with_perturb
config.distribution allows following options:
- None
- negative binomial (counts): nb
- zero-inflated negative binomial (counts): zinb
- hurdle truncated negative binomial (counts): hnb
- poisson (counts): pois
- zero infalted poisson (counts): zipois
- zero inflated gaussian (lognormalized expression): zig
counts are obtained by calculating sizefactor from lognormalized expression in the dataloader via _get_sf function
for the discrete distributions, a sizefactor or sf was needed to help the MVCDecoder learn sizefactor invariant expression mean, which were then multiplied by the corresponding target cell's size factor to get the actual distributional means). During inference, one can use the input cell's sizefactor to scale since the target cell is unknown.
eval_testdata adds new option to sample=True or False from distribution during inference and also whether to use input data's sizefactors to scale final results (reasoning shown above) sizefactor=True or False.
to use log likelihood losses, the input MUST be lognormalized expression and not bins or raw counts. (NOTE: find way to enforce this in future commits)

optimal transport:

optimal transport pairing implemented via jax (require installing jax package) when creating a dataloader
under the current setup, for each perturbation, pairing only occurs between non-perturbed cells and perturbed cells of the same cell type
optimal transport default is false via config.use_ot, parameters can be provided via config.ot_params as a dictionary (default is in set in the dataloader and roughly matches one target cell (with ~0.99 probability) for each non-perturbed cell)
optimal transport REQUIRES running PCA on anndata first (recommend at least 100 components)

- added nb, zinb, hnb, zig (zero inflate gaussian), pois and zpois distributions - calculate and use sizefactor based on distribution of model - custom sampling of all distributions

bio-info-guy added 4 commits February 12, 2026 16:21

- optimal transport pairing in dataloader

3af97df

- first expression distribution loss version

c936d7d

- added nb, zinb, hnb, zig (zero inflate gaussian), pois and zpois distributions - calculate and use sizefactor based on distribution of model - custom sampling of all distributions

- sampling arguments for evaluation

0b73b1f

-clean up comments

b31db6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal Transport and Loglikihood Losses for Expression#26

Optimal Transport and Loglikihood Losses for Expression#26
bio-info-guy wants to merge 4 commits intodavidliwei:mainfrom
bio-info-guy:dev

bio-info-guy commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bio-info-guy commented Feb 13, 2026

Major change:

Added log liklihood losses and sampling for expression prediction

optimal transport:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant