HDF5 Data loader

When working with HD5 files, one has to especially pay attention to closing the file after every step because othwerwise the iterators stalls.


The right way to do it is as follows:

```python
import h5py
import torch
import torch.utils.data as data

class H52Dataset(data.Dataset):

    def __init__(self, file_path):
        super(H52Dataset, self).__init__()
        self.file_path = file_path
        self.keys = ["transfers", "c_transfers", "reconstructions", "c_reconstructions"]
        
    def __getitem__(self, index):       
        with h5py.File(self.file_path, 'r', swmr=True) as h5_file:
            dsets = {k : h5_file[k] for k in self.keys}        
            out = tuple([torch.from_numpy(dsets[k][index, :, : :]).float() / 127.5 - 1.0 for k in self.keys])
        return out

    def __len__(self):
        with h5py.File(self.file_path, 'r', swmr=True) as h5_file:   
            _len = h5_file['transfers'].shape[0]
        return _len
```


See also here

* https://github.com/twtygqyy/pytorch-vdsr
* as indicated here, its better to close the file again
	* https://stackoverflow.com/questions/46045512/h5py-hdf5-database-randomly-returning-nans-and-near-very-small-data-with-multi/52249344#52249344


It would be great to abstract this away and simply have an hdf5 loader that returns a list of specified datasets keys. Otherwise everyone keeps implementing his own and this will most likely fail

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDF5 Data loader #272

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

HDF5 Data loader #272

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions