Skip to content

Parallel decompression for detector data#593

Merged
takluyver merged 15 commits intomasterfrom
parallel-decompress-cleanup
Feb 27, 2025
Merged

Parallel decompression for detector data#593
takluyver merged 15 commits intomasterfrom
parallel-decompress-cleanup

Conversation

@takluyver
Copy link
Member

Where detector data is stored compressed, HDF5 normally decompresses it in serial. This uses HDF5 to get the compressed chunk data, and then decompresses it in several worker threads. As I mentioned in Zulip, I experimented with a few ways of doing this, but the differences between them were minor.

I found some compressed (photonised) AGIPD data in MID proposal 6578. This shows loading 1000 frames across 16 modules. The value for 1 thread is the existing code path, with HDF5 doing the decompression. The others are all the new code path.

image

Questions:

  1. I haven't yet implemented non-zero fill values for gaps. Is that worth doing, or shall we fall back to HDF5's code when we do that?
  2. Make zlib_into (tiny package I made for this) a regular dependency or an optional one?
  3. For now the parallel decompression is opt-in, by passing e.g. .ndarray(decompress_threads=16). Do we want to turn it on by default for compressed data? Or use some heuristics to determine when it's most likely to be useful?
  4. Filling the output array one frame at a time opens up the possibility of making an array shaped like (frames, modules, slow_scan, fast_scan), instead of (modules, frames, ...), i.e. putting the different modules for one frame together in memory, which the current reading code does not allow. You could do this by making a (frames, modules, ...) array and then rearranging the axes to pass it as (modules, frames, ...) in the out= parameter. Do we want to do something to make that easier?

I have some ideas about speeding this up further, and I'd also like to get back to parallel reading of uncompressed data (as explored in #340). But I wanted to make some concrete progress on this.

@takluyver
Copy link
Member Author

takluyver commented Feb 7, 2025

The test failures are because the simplest way to write zlib_into for now made it compatible with Python 3.11 and above. I can figure out how to extend support backwards if needed, but we might make it an optional dependency, or even bump our minimum Python version.

Here's a log-log plot (the same test set up as the plot above, but a different run on a different node):

image

Copy link
Member

@JamesWrigley JamesWrigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I haven't yet implemented non-zero fill values for gaps. Is that worth doing, or shall we fall back to HDF5's code when we do that?

Which gaps do you mean?

  1. Make zlib_into (tiny package I made for this) a regular dependency or an optional one?

A regular one so that we can use it by default ⏩

  1. For now the parallel decompression is opt-in, by passing e.g. .ndarray(decompress_threads=16). Do we want to turn it on by default for compressed data? Or use some heuristics to determine when it's most likely to be useful?

Absolutely turn it on by default 😛 Which reminds me that we should advertise the multi-module keydata API since that's the only one getting the improvement.

  1. Filling the output array one frame at a time opens up the possibility of making an array shaped like (frames, modules, slow_scan, fast_scan), instead of (modules, frames, ...), i.e. putting the different modules for one frame together in memory, which the current reading code does not allow. You could do this by making a (frames, modules, ...) array and then rearranging the axes to pass it as (modules, frames, ...) in the out= parameter. Do we want to do something to make that easier?

Could be useful, but I'd say let's cross that bridge when we come to it.

@takluyver
Copy link
Member Author

Thanks!

we should advertise the multi-module keydata API since that's the only one getting the improvement.

Good point, that's also a question - it wouldn't be particularly difficult to allow this for generic KeyData objects too. Or we could use it as a selling point for the components interface (and maybe add it to generic KeyData later on).

@takluyver
Copy link
Member Author

Which gaps do you mean?

I actually just realised that filling the gaps already works with the new code, because we set the fill value when creating the output array and then overwrite it.

For completeness, there are two kind of gaps this applies to.

  • If some modules are missing entirely and you pass module_gaps=True, space is left for the missing modules, so the array will still have length 16 (say) on the module dimension.
  • Where a train is recorded by some modules and not others, there are gaps for the missing modules by default. You can avoid this by setting min_modules=16 (or however many modules there are) when constructing the component.

@JamesWrigley
Copy link
Member

Good point, that's also a question - it wouldn't be particularly difficult to allow this for generic KeyData objects too. Or we could use it as a selling point for the components interface (and maybe add it to generic KeyData later on).

That would actually be very nice because for some experiments we do indeed just need a single AGIPD module. But I think it can be left for later.

@takluyver takluyver force-pushed the parallel-decompress-cleanup branch from 2034734 to 7014d6e Compare February 10, 2025 12:58
@takluyver
Copy link
Member Author

I've enabled decompression with 16 threads by default for now. But I guess we should probably have some heuristics based on the number of available CPUs, and maybe also some way of imposing a limit externally, like an environment variable. 🤔

@takluyver takluyver added the enhancement New feature or request label Feb 12, 2025
@takluyver takluyver force-pushed the parallel-decompress-cleanup branch from 7316368 to 2167dcd Compare February 27, 2025 15:43
Copy link
Member

@JamesWrigley JamesWrigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@takluyver
Copy link
Member Author

Thanks James! 😀

@takluyver takluyver merged commit 2f1123b into master Feb 27, 2025
10 checks passed
@takluyver takluyver deleted the parallel-decompress-cleanup branch February 27, 2025 17:34
@takluyver takluyver added this to the 1.21 milestone Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants