Please provide any clues on getting the processed dataset (`Encoder.h5`), or the original data to be processed if the dataset is publicly available