[PERFORMANCES] Pre-allocate arrays and tensors to use less RAM #36
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
First thanks for providing your code associated to your paper that I read after seeing your work at PAISS. It has proved valuable in my work to estimate image generation quality on specific domain not usually well described by ImageNet-based scores.
This MR intend to fix memory issues when the dataset is too large caused by
compute_packet_statistics.packet_tensorbuffer will eat up all the RAM. For example for 10.000 RGB 256x256 images, the it uses 30Gib of RAM. By first allocating directly the tensor it avoids over using the memory when calling th.stack. But for bigger dataset the problem will persist. A solution could be to split the computation by averagingmuandsigmaacross multiple packets. But this will yield a biased estimate of the true quantity...sigma = th.stack( [gpu_cov(packet_tensor[p, :, :].to(device)) for p in range(P)], dim=0 ).numpy()will create a list of P tensor before stacking them. It would be faster to just pre-allocate the corresponding np.array and fill it in. This saves only a tiny bit of memory as the overhead still comes frompacket_tensor.On a dataset of 20k (3,256,256) images, it saves up to 30% of peak RAM.
The MR also includes removing the Exception caused by a too big imaginary component in the Fréchet distance calculation. I'm not so sure about this...
I'm interested in hearing your thoughts about these issues.