Skip to content

spatial query rechunk#989

Merged
melonora merged 3 commits intoscverse:mainfrom
ArneDefauw:spatial_query_chunks
Sep 29, 2025
Merged

spatial query rechunk#989
melonora merged 3 commits intoscverse:mainfrom
ArneDefauw:spatial_query_chunks

Conversation

@ArneDefauw
Copy link
Contributor

Fix for #988.

@codecov
Copy link

codecov bot commented Sep 23, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.11%. Comparing base (b5239b4) to head (d7c16b3).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/query/_utils.py 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #989      +/-   ##
==========================================
+ Coverage   92.07%   92.11%   +0.03%     
==========================================
  Files          48       48              
  Lines        7445     7440       -5     
==========================================
- Hits         6855     6853       -2     
+ Misses        590      587       -3     
Files with missing lines Coverage Δ
src/spatialdata/_core/query/_utils.py 83.49% <77.77%> (+2.01%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@melonora melonora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ArneDefauw, thanks for the PR.

I see that rechunking is currently always performed, which could be potentially expensive.
Would it perhaps be better to import _check_regular_chunks before the for loop that does the rechunking to see whether rechunking is required in the first place? That way you can also just avoid the second for loop:

from dask.array.core import _check_regular_chunks
for scale in result:
    data = result[scale]["image"].data
    chunks = data.chunks
    if not _check_regular_chunks(data.chunks):
        data = data.rechunk(result[scale]["image"].data.chunksize)
        if not _check_regular_chunks(data.chunks):
            raise ValueError(f"Chunks are not regular for {scale} of the queried data: {chunks} and could also not be rechunked regularly. Please report this bug.")
         result[scale]["image"].data = data   

I would also avoid asserts. These are only for development and when testing. We should start clearing the remaining asserts in src, and I do so whenever I touch code that contains them.

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Sep 29, 2025

Hi @melonora , thanks for the review!
I incorporated your comments in the PR.

I think data=data.rechunk(data.chunksize) is a no op if the chunks are not irregular, so we could even remove the _check_regular_chunks calls from the code.
But to be on the safe side, we can let them in.

@melonora
Copy link
Collaborator

Hi @melonora , thanks for the review! I incorporated your comments in the PR.

I think data=data.rechunk(data.chunksize) is a no op if the chunks are not irregular, so we could even remove the _check_regular_chunks calls from the code. But to be on the safe side, we can let them in.

from what I see in the dask codebase it is not necessarily a no op. It does not check first whether the chunking stays the same and then return early if it is the case.

@melonora
Copy link
Collaborator

Will let @LucaMarconato have a small look, but I think this is good for merge, thanks!

@melonora melonora merged commit aeef0cc into scverse:main Sep 29, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants