Skip to content

Comments

[WIP] Hdf5 io#4

Open
alexey0308 wants to merge 51 commits intomasterfrom
hdf5
Open

[WIP] Hdf5 io#4
alexey0308 wants to merge 51 commits intomasterfrom
hdf5

Conversation

@alexey0308
Copy link
Collaborator

@alexey0308 alexey0308 commented Aug 2, 2021

@simon-anders asked to provide an HDF5 interface to the matrices, saved in the middle steps.

In this PR I'm updating the io to use hdf5 instead of npz.
The current way is to save all in a single file, where hdf5 groups correspond to the chromosomes.
I do not change the function interfaces for now, i.e. it still uses a directory as input, hence the output file name
is hardcoded.

@simon-anders @LKremer please comment here in case you have alternative better ideas, since it was discussed only between me and Simon so far.

  • ADD ExitStack in the dump COO function to avoid unclosed files in case of exception.
  • ADD Calculate nnz element number during dump coo function and return it
  • ADD streamed saving and reading for large data sets:
    A duck-typed object is used to represent the sparse matrix from HDF5 file.
    prepare function got an additional argument to choose between in memory or streamed transformation COO->CSR in HDF5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants