Releases: ArcInstitute/cell-load
v0.10.3
Adds a consecutive data loading option for training on huge datasets. This packs cell sets so that within a condition, they are consecutive on disk, leading to around a 3x improvement for HVG training (e.g. output space = gene), and closer to 12-15x improvement for full transcriptome training (output space = all). Code will error if underlying data is not sorted by condition
v0.8.7
updates batch codes to be stored as categoricals in metadata cache, if not already done, to properly deal with codes across multiple datasets
v0.8.6
defaults to using /var/feature if /var/_index is not available in anndata files
v0.8.5
see v0.8.4. hotfix to fix small bug
v0.8.4
enables training on all data (no test subset required)
leave toml arrays empty for this functionality
fix reversion in vcc
allow embedding as output space
this adds an 'embedding' option for data.kwargs.output_space. if set, the getitem call only yields embeddings and not counts. this is paired with a new option in state where users can train only on embedding spaces (no need to force a decoder to counts)
also fixes a bug with filter_on_target_knockdown
Enabling training on observational data
enables padded argument for cell-load train dataloader
this can be used to easily generate predictions on the training data with the predict binary
add cli
1/ exposes filter on target knockdown as something that can be run as:
uvx -q --from git+https://github.com/ArcInstitute/cell-load.git@cli_run_uv filter_on_target_knockdown --help