Skip to content

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Sep 1, 2025

Completes the work of #3299 by replacing a second invocation of _iter_chunk_coords with _iter_shard_coords.

In a separate PR, we need to refactor this code block:

if write_data:
if isinstance(data, Array):
async def _copy_array_region(
chunk_coords: tuple[int, ...] | slice, _data: Array
) -> None:
arr = await _data._async_array.getitem(chunk_coords)
await result.setitem(chunk_coords, arr)
# Stream data from the source array to the new array
await concurrent_map(
[(region, data) for region in result._iter_shard_regions()],
_copy_array_region,
zarr.core.config.config.get("async.concurrency"),
)
else:
async def _copy_arraylike_region(chunk_coords: slice, _data: NDArrayLike) -> None:
await result.setitem(chunk_coords, _data[chunk_coords])
# Stream data from the source array to the new array
await concurrent_map(
[(region, data) for region in result._iter_chunk_regions()],
_copy_arraylike_region,
zarr.core.config.config.get("async.concurrency"),
)
return result
. It's massive code smell to have special "write one array to another" logic defined in the tail end of an array creation function.

I add a test that checks how many get requests we make when calling create_array(data=..). In main, it's 1 get per chunk (bad). In this PR, it's 1 get per shard (better). But we can also get to 0 gets per shard by introducing some special logic for full shard writes. expect this in a later PR.

edit: closes #3169 and #3421

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 1, 2025
@d-v-b d-v-b requested a review from a team September 1, 2025 18:51
Copy link

codecov bot commented Sep 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.92%. Comparing base (3d0e40e) to head (f2c4fac).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3422   +/-   ##
=======================================
  Coverage   94.92%   94.92%           
=======================================
  Files          79       79           
  Lines        9491     9491           
=======================================
  Hits         9009     9009           
  Misses        482      482           
Files with missing lines Coverage Δ
src/zarr/core/array.py 97.44% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@maxrjones maxrjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we don't actually need one get per shard, but this is the current behavior

Could you open an issue to track this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"Stored and computed checksum do not match" in case when shards large than chunks
2 participants