Parallel decompression for detector data#593
Conversation
|
The test failures are because the simplest way to write Here's a log-log plot (the same test set up as the plot above, but a different run on a different node): |
There was a problem hiding this comment.
- I haven't yet implemented non-zero fill values for gaps. Is that worth doing, or shall we fall back to HDF5's code when we do that?
Which gaps do you mean?
- Make zlib_into (tiny package I made for this) a regular dependency or an optional one?
A regular one so that we can use it by default ⏩
- For now the parallel decompression is opt-in, by passing e.g.
.ndarray(decompress_threads=16). Do we want to turn it on by default for compressed data? Or use some heuristics to determine when it's most likely to be useful?
Absolutely turn it on by default 😛 Which reminds me that we should advertise the multi-module keydata API since that's the only one getting the improvement.
- Filling the output array one frame at a time opens up the possibility of making an array shaped like (frames, modules, slow_scan, fast_scan), instead of (modules, frames, ...), i.e. putting the different modules for one frame together in memory, which the current reading code does not allow. You could do this by making a (frames, modules, ...) array and then rearranging the axes to pass it as (modules, frames, ...) in the
out=parameter. Do we want to do something to make that easier?
Could be useful, but I'd say let's cross that bridge when we come to it.
|
Thanks!
Good point, that's also a question - it wouldn't be particularly difficult to allow this for generic KeyData objects too. Or we could use it as a selling point for the components interface (and maybe add it to generic KeyData later on). |
I actually just realised that filling the gaps already works with the new code, because we set the fill value when creating the output array and then overwrite it. For completeness, there are two kind of gaps this applies to.
|
That would actually be very nice because for some experiments we do indeed just need a single AGIPD module. But I think it can be left for later. |
2034734 to
7014d6e
Compare
|
I've enabled decompression with 16 threads by default for now. But I guess we should probably have some heuristics based on the number of available CPUs, and maybe also some way of imposing a limit externally, like an environment variable. 🤔 |
7316368 to
2167dcd
Compare
|
Thanks James! 😀 |

Where detector data is stored compressed, HDF5 normally decompresses it in serial. This uses HDF5 to get the compressed chunk data, and then decompresses it in several worker threads. As I mentioned in Zulip, I experimented with a few ways of doing this, but the differences between them were minor.
I found some compressed (photonised) AGIPD data in MID proposal 6578. This shows loading 1000 frames across 16 modules. The value for 1 thread is the existing code path, with HDF5 doing the decompression. The others are all the new code path.
Questions:
.ndarray(decompress_threads=16). Do we want to turn it on by default for compressed data? Or use some heuristics to determine when it's most likely to be useful?out=parameter. Do we want to do something to make that easier?I have some ideas about speeding this up further, and I'd also like to get back to parallel reading of uncompressed data (as explored in #340). But I wanted to make some concrete progress on this.