Options to read & decompress data in parallel#340
Draft
Conversation
JamesWrigley
requested changes
Dec 7, 2022
Member
JamesWrigley
left a comment
There was a problem hiding this comment.
Some benchmarks 🐎 I selected AGIPD runs with ~200 cells and loaded the first 1000 trains of a single module into memory.
This is loading a compressed uint16 AGIPD module from p3025 with 200 cells:

(same but with a linear scale)

And loading an uncompressed float32 AGIPD module from p3046 with 202 cells:

(same but with a linear scale)

Amusingly, it's faster to load the uncompressed data despite it being ~80.5GB on disk compared to ~2.34GB of compressed data 🙃 (both proc/ directories still being on GPFS).
But still a huge improvement 🎉
|
|
||
|
|
||
| # Based on _alloc function in pasha | ||
| def zeros_shared(shape, dtype): |
Member
There was a problem hiding this comment.
Could we expose this? Or add something like KeyData.allocate_out(shared=True)?
|
|
||
| return out | ||
|
|
||
| def ndarray(self, roi=(), out=None, read_procs=1, decomp_threads=1): |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This adds
read_procsanddecomp_threadsparameters toKeyData.ndarray()and.xarray(). They control the number of processes used to read data from HDF5 files, and threads used to decompress data in the specific pattern we use for gain/mask datasets in 2D data. They both default to 1, i.e. the status quo, and we avoid launching separate processes/threads when they're 1.Testing with ~55 GB of JUNGFRAU data, I got a better-than-2x speedup reading uncompressed data with 10 processes (~1 minute -> ~24 seconds), and something like a 10x speedup reading compressed data with
decomp_threads=-1, i.e. 1 thread per core, on a 72-core node (~1 min 40 s -> 10 s). The timings are pretty variable - AFAICT, filesystem access always is.The
read_procsoption is kind of incompatible with passing in anoutarray, because the array needs to be in shared memory. I'm not sure how to deal with that in the API - we could reject usingoutandread_procstogether, but you could also pass in an array in shared memory, and I don't know of any way to check for that.Future work:
Closes #49.