Skip to content

Switch to HDF5 based storage of intermediate data types. #34

@cerebis

Description

@cerebis

Currently data is stored simply compressing pickled python classes.

This approacj was chosen over other serialisation methods as a good-enough and quick approach. However, as time passes and the codebase evoles, class version dependency for existing serialised instances becomes increasingly problematic. This can prevent users wishing to go back to old data and reanalyse with newer version of the software, since the class cannot be deserialised.

Either we must provide conversions between class changes or better avoid this entirely.

Therefore, bin3C should switch to using a class-agnostic and efficient means of storing intermediate analysis results (contact map, clusterings). Though we could pickle plain datatypes, an obvious candidate is HDF5, which would introduce a chunk of dependencies itself. Another alternative is to consider adopting an existing Hi-C HDF5 format, so long as these do not themselves include external class implementation details or extraneous fields not relevant to metagenomics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions