implement coordinate hash by d-chambers · Pull Request #651 · DASDAE/dascore

d-chambers · 2026-04-04T09:42:09Z

Summary

This PR does two things:does two things:

Implements hash for coords, normalizing unit-bearing values to base SI units so equivalent coords hash the same.
Fixes an intermittent hang in remote HTTP HDF5 reads caused by the no-range fallback path.

Coord hashing

Implement BaseCoord.hash
Normalize unit-bearing coords to base SI before hashing
Use range metadata directly for CoordRange
Hash full arrays for non-range coords
Rename hash_numpy_array to canonical hash_array
Keep a compatibility alias for hash_numpy_array

Remote HTTP HDF5 fix
Root cause was in the remote HDF5 fallback path:

when an HTTP server did not support range requests, fallback could re-enter fsspec HTTP while already inside an active fsspec HTTP read
that could block in the async/sync bridge and hang the test run
after metadata fallback had already materialized a local cached file, later HDF5 reads still reopened the remote HTTP path instead of reusing the cached artifact

Fixes:

HTTP cache downloads now use direct blocking urlopen(...) instead of resource.open(...) in the fallback materialization path
H5Reader now prefers an already-cached local file for remote HDF5 resources before attempting the remote-first fallback path
H5Reader also explicitly owns and closes wrapped file objects

Summary by CodeRabbit

New Features
- Coordinates gain stable, deterministic hashing for use as dict/set keys.
- Array hashing improved (including datetime handling) with a legacy alias preserved.
- HTTP(S) downloads now honor custom headers and a configurable timeout; remote caching helper added.
Bug Fixes
- Improved HDF5 reader handle ownership to ensure proper resource cleanup.
Tests
- Added extensive tests for coordinate hashing, array hashing, HTTP downloads, HDF5 handles, and remote caching.

coderabbitai · 2026-04-04T09:42:22Z

📝 Walkthrough

Walkthrough

Adds deterministic hashing to coordinate types via BaseCoord.__hash__() with unit- and dtype-aware hashing; renames hash_numpy_array → hash_array (keeps alias) and adjusts fast-path for datelike dtypes; introduces managed HDF5 handle proxy and HTTP streaming download with cached-local-file resolution; adds tests covering these behaviors.

Changes

Cohort / File(s)	Summary
Coordinate hashing & tests `dascore/core/coords.py`, `tests/test_core/test_coords.py`	Implements `BaseCoord._get_hashable_coord`, `_hash_scalar`, `_hash_array_token`, and overrides `__hash__()` to produce deterministic hashes for `CoordRange`, `CoordPartial`, `CoordArray`/others with unit normalization and dtype-aware rules. Changes `CoordPartial.convert_units` from alias-to-update to an explicit method. Adds tests validating hash equality/inequality, unit-normalization, null-scalar handling, and round-trip/hash contract.
Array hashing util & tests `dascore/utils/array.py`, `tests/test_utils/test_array_utils.py`	Renames public function to `hash_array(arr)` and retains `hash_numpy_array` as a compatibility alias. Updates fast-path logic to avoid zero-copy handling for datetime/timedelta dtypes and adds tests covering `datetime64[ns]` behavior and alias parity.
HDF5 handle lifecycle & remote I/O `dascore/utils/hdf5.py`, `dascore/utils/remote_io.py`, `dascore/config.py`, `tests/test_utils/test_io_utils.py`, `tests/test_io/test_remote_http.py`	Adds `_ManagedH5pyFile` proxy to ensure paired closure of `h5py.File` and underlying file-like objects; centralizes opening `h5py` with the `fileobj` driver. Refactors `_download_remote_file` to use `urllib.request`/`urlopen` with headers and timeout from config; adds `get_cached_local_file()` helper. Adds `remote_download_timeout` config field and tests for handle lifecycle, caching behavior, and HTTP download semantics.

Possibly related PRs

add array hash function #629: Introduced the original hash_numpy_array(...) utility; directly related because this change renames it to hash_array(...) and uses it in coordinate hashing.

Suggested labels

proc, IO, patch

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'implement coordinate hash' is concise and clearly summarizes the primary feature being added to the codebase.
Description check	✅ Passed	The PR description is well-structured and comprehensive, covering both major features (coordinate hashing and remote HTTP HDF5 fixes) with clear explanations of the problems, solutions, and implementation details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch coord_hash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1cfaa34167

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-04T09:45:58Z

dascore/core/coords.py

+        payload = (coord.__class__, unit_str, hash_array(coord.values))
+        return hash(payload)


Align CoordArray hashing with tolerant equality

BaseCoord.__hash__ hashes raw coord.values bytes, but BaseCoord.__eq__ (via sensible_model_equals/all_close) treats float arrays as equal within np.isclose tolerances. Two CoordArray instances that compare equal (e.g., tiny floating-point differences) can therefore produce different hashes, which breaks Python's hash contract and causes incorrect behavior in sets/dicts keyed by coordinates.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-04T09:45:58Z

dascore/utils/remote_io.py

+            headers = dict(getattr(resource, "storage_options", {}) or {})
+            request = Request(str(resource), headers=headers)


Preserve HTTP options instead of mapping all to headers

The HTTP download path now converts resource.storage_options directly into request headers, which misinterprets common UPath/fsspec options (for example headers={...} becomes a literal headers: header, and non-header options are also injected). In authenticated or customized HTTP setups this drops the intended auth/config and can make fallback materialization fail (e.g., 401/403) when ensure_local_file is triggered.

Useful? React with 👍 / 👎.

codecov · 2026-04-04T09:46:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (e656799) to head (df55ff8).
⚠️ Report is 2 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##               dev      #651    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files          134       137     +3     
  Lines        12042     12736   +694     
==========================================
+ Hits         12042     12736   +694

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/core/coords.py`:
- Around line 457-461: The hashing path currently leaves scalar magnitudes in
their original units because CoordPartial.convert_units just updates unit
metadata; modify the normalization so start/stop/step magnitudes are converted
to the target units before hashing. Concretely, update
CoordPartial.convert_units (or inside _get_hashable_coord) to actually convert
numeric magnitudes (start, stop, step) using the units conversion (e.g., via the
units library's to/convert method) rather than only updating the units
attribute, or ensure _get_hashable_coord calls simplify_units on a copy whose
start/stop/step have been converted to the simplified units; reference the
methods _get_hashable_coord, simplify_units, and CoordPartial.convert_units and
convert those scalar fields to the target units so equivalent values in cm vs m
hash identically.
- Around line 470-495: The __hash__ implementation currently returns Python's
built-in hash(payload) which is non-deterministic across interpreter runs;
change __hash__ to produce a deterministic digest (e.g. SHA-256) by serializing
the same payload components used now (coord.__class__ identity string, unit_str,
coord.shape, dtype string for CoordPartial, and the outputs of _hash_scalar and
hash_array) into a stable byte sequence and returning a stable integer derived
from that digest; update the branches for CoordRange, CoordPartial, and the
default (array values) in the __hash__ method to build the same payload items
but pass them through a deterministic hasher (hashlib) rather than hash(),
keeping references to _get_hashable_coord, _hash_scalar, CoordRange,
CoordPartial, and hash_array so the payload composition remains unchanged.

In `@dascore/utils/remote_io.py`:
- Around line 138-147: The urlopen call in the HTTP branch uses no timeout and
can hang; update the logic that handles HTTP downloads (check _HTTP_PROTOCOLS,
Request and urlopen usage in this function) to pass a timeout value (either a
new DascoreConfig field like remote_download_timeout on get_config() or a
sensible default such as 60 seconds) when calling urlopen(Request(...),
timeout=...). Keep the existing chunked read using
get_config().remote_download_block_size and ensure the new config field is read
from get_config() so tests can override it.

In `@tests/test_core/test_coords.py`:
- Around line 535-539: The test test_equal_coords_share_hash currently passes
the same instance because get_coord(data=coord) returns coord unchanged; change
it to construct a distinct but equal instance (e.g., call
get_coord(data=coord.model_dump()) or build from copied values/range metadata)
so other != coord (identity) but other == coord (equality) and then assert
hash(other) == hash(coord); ensure you reference the get_coord call and
coord.model_dump() (or an explicit deepcopy of coord's fields) to rebuild the
second instance.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bcf61b71-3ead-4137-b7b1-e1cec3b3594f

📥 Commits

Reviewing files that changed from the base of the PR and between 35d6b82 and 1cfaa34.

📒 Files selected for processing (8)

dascore/core/coords.py
dascore/utils/array.py
dascore/utils/hdf5.py
dascore/utils/remote_io.py
tests/test_core/test_coords.py
tests/test_io/test_remote_http.py
tests/test_utils/test_array_utils.py
tests/test_utils/test_io_utils.py

coderabbitai · 2026-04-04T09:50:10Z

dascore/core/coords.py

+    def _get_hashable_coord(self) -> Self:
+        """Return a coordinate normalized for stable hashing."""
+        if self.units is None or dtype_time_like(self.dtype):
+            return self
+        return self.simplify_units()


⚠️ Potential issue | 🟡 Minor

CoordPartial unit normalization is a no-op for the hashed scalars.

This path assumes simplify_units() also normalizes start/stop/step, but CoordPartial.convert_units is just update, so those magnitudes stay in their original units. A partial coord in centimeters and the equivalent one in meters will still hash differently.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dascore/core/coords.py` around lines 457 - 461, The hashing path currently leaves scalar magnitudes in their original units because CoordPartial.convert_units just updates unit metadata; modify the normalization so start/stop/step magnitudes are converted to the target units before hashing. Concretely, update CoordPartial.convert_units (or inside _get_hashable_coord) to actually convert numeric magnitudes (start, stop, step) using the units conversion (e.g., via the units library's to/convert method) rather than only updating the units attribute, or ensure _get_hashable_coord calls simplify_units on a copy whose start/stop/step have been converted to the simplified units; reference the methods _get_hashable_coord, simplify_units, and CoordPartial.convert_units and convert those scalar fields to the target units so equivalent values in cm vs m hash identically.

dascore/utils/remote_io.py

tests/test_core/test_coords.py

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

dascore/core/coords.py (1)

493-506: ⚠️ Potential issue | 🟠 Major

hash(payload) still makes the result process-specific.

The helpers build stable tokens, but the final hash(payload) reintroduces PYTHONHASHSEED and object-identity variability, so the new coordinate hash still changes across interpreter runs. If this is intended for stable cache keys or cross-process comparisons, finish with a deterministic digest instead of Python’s built-in hash().

#!/bin/bash
set -euo pipefail

for seed in 1 2; do
  printf 'PYTHONHASHSEED=%s -> ' "$seed"
  PYTHONHASHSEED=$seed python - <<'PY'
payload = ("CoordRange", "m", (10,), ("scalar", "0123456789abcdef"))
print(hash(payload))
PY
done

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dascore/core/coords.py` around lines 493 - 506, The code currently returns
Python's built-in hash(payload) (in the CoordPartial branch and the fallback),
which is process-dependent; instead compute and return a deterministic digest
(e.g., SHA-256 hex) over the stable tokens you already build (class name,
unit_str, coord.shape, str(np.dtype(...)), and the outputs of
_hash_scalar/_hash_array_token). Replace the final hash(payload) calls with a
deterministic digest computed via hashlib (serialize the payload parts in a
stable order/encoding—e.g., join or JSON with sort_keys disabled since you
control ordering—and encode to bytes before hashing) and return that fixed hex
string (or a stable int derived from the hex) so both branches (CoordPartial and
non-partial) produce cross-process stable keys.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dascore/utils/remote_io.py`:
- Around line 138-143: The code treats storage_options as flat HTTP headers and
trusts the resource protocol; update the HTTP branch handling around
_HTTP_PROTOCOLS, Request, and urlopen to (1) extract headers compatibly from
getattr(resource, "storage_options", {}) by checking for a nested "headers" key
first and falling back to the top-level mapping (ensuring a dict of header
strings), and (2) explicitly parse and validate the URL scheme of str(resource)
(e.g., via urllib.parse.urlparse) to allow only "http" or "https" before calling
urlopen (use get_config().remote_download_timeout as the timeout). Ensure
Request receives the validated URL and the constructed headers.

---

Duplicate comments:
In `@dascore/core/coords.py`:
- Around line 493-506: The code currently returns Python's built-in
hash(payload) (in the CoordPartial branch and the fallback), which is
process-dependent; instead compute and return a deterministic digest (e.g.,
SHA-256 hex) over the stable tokens you already build (class name, unit_str,
coord.shape, str(np.dtype(...)), and the outputs of
_hash_scalar/_hash_array_token). Replace the final hash(payload) calls with a
deterministic digest computed via hashlib (serialize the payload parts in a
stable order/encoding—e.g., join or JSON with sort_keys disabled since you
control ordering—and encode to bytes before hashing) and return that fixed hex
string (or a stable int derived from the hex) so both branches (CoordPartial and
non-partial) produce cross-process stable keys.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 262318e8-9658-434d-b8cd-e486140ca018

📥 Commits

Reviewing files that changed from the base of the PR and between 1cfaa34 and df55ff8.

📒 Files selected for processing (5)

dascore/config.py
dascore/core/coords.py
dascore/utils/remote_io.py
tests/test_core/test_coords.py
tests/test_utils/test_io_utils.py

✅ Files skipped from review due to trivial changes (1)

tests/test_core/test_coords.py

coderabbitai · 2026-04-04T11:11:38Z

dascore/core/coords.py

+    def __hash__(self):
+        coord = self._get_hashable_coord()
+        unit_str = coord.unit_str
+        if isinstance(coord, CoordRange):
+            payload = (
+                coord.__class__,
+                unit_str,
+                coord.shape,
+                self._hash_scalar(coord.start),
+                self._hash_scalar(coord.stop),
+                self._hash_scalar(coord.step),
+            )
+            return hash(payload)
+        if isinstance(coord, CoordPartial):
+            payload = (
+                coord.__class__,
+                unit_str,
+                coord.shape,
+                str(np.dtype(coord.dtype)),
+                self._hash_scalar(coord.start),
+                self._hash_scalar(coord.stop),
+                self._hash_scalar(coord.step),
+            )
+            return hash(payload)
+        payload = (coord.__class__, unit_str, self._hash_array_token(coord.values))
+        return hash(payload)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Documented cross-family equality example:" sed -n '1888,1895p' dascore/core/coords.py echo echo "Current __hash__ implementation:" sed -n '481,506p' dascore/core/coords.py echo echo "Shared model equality implementation:" fd 'models.py' dascore | xargs -r rg -n -C3 'class DascoreBaseModel|def __eq__'

Repository: DASDAE/dascore

Length of output: 1624

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Finding _get_hashable_coord implementation ===" rg -n -A10 '_get_hashable_coord' dascore/core/coords.py | head -30 echo echo "=== Finding __eq__ implementation in BaseCoord ===" rg -n -B2 -A15 'def __eq__' dascore/core/coords.py | head -50 echo echo "=== Checking _hash_scalar and _hash_array_token implementations ===" rg -n -A8 'def _hash_scalar|def _hash_array_token' dascore/core/coords.py echo echo "=== Checking if CoordRange is converted to CoordArray anywhere ===" rg -n 'to_array|convert.*array' dascore/core/coords.py | head -20

Repository: DASDAE/dascore

Length of output: 1057

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Class definitions and inheritance ===" rg -n 'class BaseCoord|class CoordRange|class CoordArray' dascore/core/coords.py | head -10 echo echo "=== Search for __eq__ in entire coords.py file ===" rg -n '__eq__' dascore/core/coords.py echo echo "=== Check if BaseModel or parent provides __eq__ ===" rg -n 'class DascoreBaseModel' dascore/models.py -A15 | head -30 echo echo "=== Check simplify_units implementation ===" rg -n -A20 'def simplify_units' dascore/core/coords.py echo echo "=== Look at entire BaseCoord class definition start ===" sed -n '80,130p' dascore/core/coords.py

Repository: DASDAE/dascore

Length of output: 267

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== DascoreBaseModel full implementation ===" rg -n -A50 'class DascoreBaseModel' dascore/models.py | head -70 echo echo "=== Check if Pydantic BaseModel is used (implicit __eq__) ===" rg -n 'from pydantic import BaseModel' dascore/models.py echo echo "=== Check simplify_units to understand what it does ===" rg -n -A30 'def simplify_units' dascore/core/coords.py | head -50 echo echo "=== Look for any coord value/data comparison methods ===" rg -n 'values.*==|data.*==|array.*==' dascore/core/coords.py | head -20

Repository: DASDAE/dascore

Length of output: 177

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find where DascoreBaseModel is defined ===" rg -n 'class DascoreBaseModel' dascore/ echo echo "=== Check imports in coords.py ===" head -50 dascore/core/coords.py | rg -n 'DascoreBaseModel|import' echo echo "=== Try to find models.py in different locations ===" fd -t f 'models.py' dascore/ echo echo "=== Check entire dascore structure ===" ls -la dascore/ | head -20

Repository: DASDAE/dascore

Length of output: 2247

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Full DascoreBaseModel implementation ===" rg -n -A60 'class DascoreBaseModel' dascore/utils/models.py echo echo "=== Check if __eq__ is explicitly defined in DascoreBaseModel ===" rg -n '__eq__' dascore/utils/models.py echo echo "=== Check full import list in coords.py ===" head -60 dascore/core/coords.py | grep -n 'from\|import'

Repository: DASDAE/dascore

Length of output: 2255

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find sensible_model_equals implementation ===" rg -n -B2 -A20 'def sensible_model_equals' dascore/utils/models.py echo echo "=== Check what is imported from dascore.utils.models in coords.py ===" rg -n -A10 'from dascore.utils.models import' dascore/core/coords.py echo echo "=== Look at CoordRange and CoordArray definitions to understand their structure ===" rg -n -A5 'class CoordRange|class CoordArray' dascore/core/coords.py | head -30

Repository: DASDAE/dascore

Length of output: 1782

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Full CoordRange class definition ===" sed -n '1164,1250p' dascore/core/coords.py echo echo "=== Full CoordArray class definition ===" sed -n '1422,1500p' dascore/core/coords.py echo echo "=== Check model_fields for CoordRange ===" rg -n 'model_fields|__init__' dascore/core/coords.py | grep -A5 -B5 'CoordRange'

Repository: DASDAE/dascore

Length of output: 6601

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Search for any override of __eq__ in CoordRange or CoordArray ===" rg -n 'class CoordRange|class CoordArray' dascore/core/coords.py -A100 | grep -E 'def __eq__|class CoordRange|class CoordArray' echo echo "=== Check if there's any conversion from CoordRange to CoordArray before comparison ===" rg -n 'to_array|values' dascore/core/coords.py | grep -i 'coordrange\|coordarray' | head -20 echo echo "=== Look at the exact test case from the docstring to understand the claim ===" sed -n '1888,1905p' dascore/core/coords.py echo echo "=== Check if there's a conversion method on CoordRange ===" rg -n 'def values|def to_array' dascore/core/coords.py | grep -B5 -A5 'CoordRange\|1[2-3][0-9][0-9]'

Repository: DASDAE/dascore

Length of output: 1392

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Check BaseCoord.values property (line 1097) ===" sed -n '1097,1110p' dascore/core/coords.py echo echo "=== Check CoordArray.values property (line 1370) ===" sed -n '1370,1380p' dascore/core/coords.py echo echo "=== Check all fields defined in BaseCoord and subclasses ===" sed -n '211,280p' dascore/core/coords.py echo echo "=== Now verify the actual test - run a minimal Python test to see if they compare equal ===" python3 << 'EOF' import sys sys.path.insert(0, '/root/code') import numpy as np from dascore.core.coords import get_coord # Create test coordinates range_coord = get_coord(start=1, stop=12, step=1) array_coord = get_coord(data=np.arange(1, 12, 1)) print(f"Range coord type: {type(range_coord)}") print(f"Array coord type: {type(array_coord)}") print(f"Range coord fields: {range_coord.model_dump().keys()}") print(f"Array coord fields: {array_coord.model_dump().keys()}") print(f"Are they equal? {range_coord == array_coord}") print(f"Range coord dump: {range_coord.model_dump()}") print(f"Array coord dump: {array_coord.model_dump()}") EOF

Repository: DASDAE/dascore

Length of output: 5129

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find get_coord function to see if it normalizes types ===" rg -n -A50 'def get_coord' dascore/core/coords.py | head -80 echo echo "=== Check what line 1370 actually is - full CoordRange.values property ===" sed -n '1370,1400p' dascore/core/coords.py echo echo "=== Look for any __eq__ override in CoordRange or subclasses ===" sed -n '1164,1420p' dascore/core/coords.py | grep -n '__eq__\|def values'

Repository: DASDAE/dascore

Length of output: 3370

Fix hash contract violation and document cross-family equality mismatch.

The documentation at line 1894 claims CoordRange and CoordArray instances should compare equal when they represent the same data, but sensible_model_equals will return False because they have different field sets—CoordRange has (start, stop, step) while CoordArray has (values). Additionally, the __hash__ implementation includes coord.__class__ in each payload and uses separate field combinations for CoordRange vs CoordArray, which would violate Python's hash contract if the equality claim were true (equal objects must have equal hash values). Either implement the cross-family equality properly by normalizing coords to a canonical representation before comparison and hashing, or update the documentation to reflect that different coord families are not equal.

coderabbitai · 2026-04-04T11:11:38Z

dascore/utils/remote_io.py

+        if protocol in _HTTP_PROTOCOLS:
+            headers = dict(getattr(resource, "storage_options", {}) or {})
+            request = Request(str(resource), headers=headers)
+            timeout = get_config().remote_download_timeout
+            with (
+                urlopen(request, timeout=timeout) as remote_fi,


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify how HTTP headers are represented in UPath usage across this repo. rg -nP 'UPath\([^)]*headers\s*=' --type=py -C2 rg -nP 'storage_options.*headers|["'\'']headers["'\'']' --type=py -C2 rg -nP '\bRequest\(|\burlopen\(' dascore/utils/remote_io.py -C2

Repository: DASDAE/dascore

Length of output: 1334

🏁 Script executed:

# 1. Examine the test case to understand how storage_options are passed sed -n '670,700p' tests/test_utils/test_io_utils.py # 2. Find _HTTP_PROTOCOLS definition rg -n '_HTTP_PROTOCOLS' dascore/utils/remote_io.py -B2 -A2 # 3. Look at the full context of _download_remote_file function sed -n '120,160p' dascore/utils/remote_io.py # 4. Check for any other storage_options usage in HTTP context rg -n 'storage_options' dascore/utils/remote_io.py -B2 -A2

Repository: DASDAE/dascore

Length of output: 3962

🏁 Script executed:

# Examine the _HTTPResource test class definition sed -n '660,680p' tests/test_utils/test_io_utils.py # Check for any other HTTP storage_options examples in tests rg -n 'storage_options.*=' tests/test_utils/test_io_utils.py -B1 -A1 | grep -A5 -B5 'http' # Check if there's any fsspec/upath dependency that defines headers structure rg -n 'fsspec|upath' dascore/utils/remote_io.py # Look for any imports or references to understand the HTTP backend head -30 dascore/utils/remote_io.py

Repository: DASDAE/dascore

Length of output: 2178

🏁 Script executed:

# Check test setup for HTTP resource to see the actual storage_options structure sed -n '645,650p' tests/test_utils/test_io_utils.py # Look for all storage_options structures in tests rg -n 'storage_options\s*=' tests/test_utils/test_io_utils.py -A1 # Check if there's any documentation about storage_options format in comments/docstrings rg -n 'storage_options' dascore/utils/remote_io.py -B3 -A3 | head -40 # Check what UPath (coerce_to_upath) is and if it has documentation about storage_options rg -n 'class UPath|def coerce_to_upath' dascore/compat.py -A5

Repository: DASDAE/dascore

Length of output: 1266

🏁 Script executed:

# Check fsspec and upath imports to understand storage_options structure rg -n 'from dascore.compat import' dascore/utils/remote_io.py # Look at the compat module to see what UPath is cat dascore/compat.py | head -50 # Search for any storage_options nesting patterns in actual usage (not just tests) rg -n 'storage_options' dascore/ --type=py -B2 -A2 | grep -i "headers" | head -20 # Check if there are any real-world (non-test) HTTP examples with storage_options rg -rn '"headers".*storage_options|storage_options.*"headers"' dascore/ --type=py

Repository: DASDAE/dascore

Length of output: 1488

🏁 Script executed:

# Search for any usage of UPath with HTTP in the actual codebase (not just tests) rg -n 'UPath.*http|http.*UPath' dascore/ --type=py # Check if there are any real-world examples of storage_options being passed rg -n '_download_remote_file' dascore/ --type=py -B2 -A2 # Look for any integration tests or examples that use HTTP downloading find dascore/ -name '*.py' -type f -exec grep -l 'http.*download\|download.*http' {} \; # Check if upath/fsspec HTTP backend documentation is referenced anywhere rg -n 'fsspec|fsspec.implementations.http' dascore/ --type=py

Repository: DASDAE/dascore

Length of output: 695

🏁 Script executed:

# Check fsspec documentation or any comments about storage_options structure web_search query about fsspec HTTP storage_options structure # Also check if there's any version pinning or requirements file cat dascore/pyproject.toml 2>/dev/null | grep -i 'upath\|fsspec\|urllib' || cat setup.py 2>/dev/null | grep -i 'upath\|fsspec' # Check the actual test to see what storage_options are expected to work sed -n '640,695p' tests/test_utils/test_io_utils.py

Repository: DASDAE/dascore

Length of output: 2115

🌐 Web query:

fsspec HTTP backend storage_options structure headers

💡 Result:

The fsspec HTTP backend (HTTPFileSystem) accepts storage_options as keyword arguments in its constructor, which are stored in self.kwargs and passed to aiohttp session requests (e.g., session.get(url, self.kwargs)). Key storage_options include: - headers: dict of HTTP headers applied to all requests (e.g., {"Authorization": "Bearer token"}). Headers can also be overridden per-operation via kwargs in methods like open. - client_kwargs: dict passed to aiohttp.ClientSession constructor (e.g., for auth: {"auth": aiohttp.BasicAuth("user", "pass")}). - Other common options: block_size (read block size, default 5MB), simple_links (for ls), cache_type, cache_options. Example usage: fs = fsspec.filesystem("http", headers={"User-Agent": "my-app"}, client_kwargs={"timeout": aiohttp.ClientTimeout(total=30)}) with fs.open("https://example.com/file.txt") as f: data = f.read For high-level functions like fsspec.open("https://...", storage_options={"headers": {...}}). Source code confirms storage_options become self.kwargs for requests, and explicit params like client_kwargs are separate. Headers in storage_options go to per-request kwargs.

Citations:

1: https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/http.html

2: https://filesystem-spec.readthedocs.io/en/latest/api.html

3: https://filesystem-spec.readthedocs.io/en/latest/%5Fmodules/fsspec/implementations/http.html

4: https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/http.py

5: Support request_kwargs param for HTTPFilesystem fsspec/filesystem_spec#1142

6: https://stackoverflow.com/questions/76287442/load-dask-dataframe-from-remote-web-server-https-with-custom-headers

🏁 Script executed:

# Verify if the protocol field from resource always matches the actual URL scheme rg -n 'protocol.*=' tests/test_utils/test_io_utils.py | grep -i http # Check if there are any cases where protocol and URL scheme could mismatch rg -n 'coerce_to_upath' tests/test_utils/test_io_utils.py -B2 -A2 | head -30 # Check if there are any edge cases in URL construction sed -n '125,150p' dascore/utils/remote_io.py # Verify RemoteCacheError exists rg -n 'class RemoteCacheError' dascore/

Repository: DASDAE/dascore

Length of output: 2146

🏁 Script executed:

# Final verification: check if there are any real-world calls to _download_remote_file with nested headers rg -n '_download_remote_file\|materialize_remote_file' dascore/ --type=py -B3 -A3 | head -50

Repository: DASDAE/dascore

Length of output: 40

Add explicit URL scheme validation and support nested header extraction for fsspec compatibility.

Line 139 treats storage_options directly as HTTP headers. While the current test passes with flat headers (e.g., {"User-Agent": "..."}"), fsspec's HTTP backend also supports headers nested under a "headers" key for consistency with other storage backends. Additionally, line 143 should validate the actual URL scheme before passing to urlopen (S310 defense-in-depth), rather than relying solely on the resource's declared protocol.

Proposed fix

+from urllib.parse import urlsplit from urllib.request import Request, urlopen @@ if protocol in _HTTP_PROTOCOLS: - headers = dict(getattr(resource, "storage_options", {}) or {}) - request = Request(str(resource), headers=headers) + storage_options = dict(getattr(resource, "storage_options", {}) or {}) + raw_headers = storage_options.get("headers", storage_options) + headers = {str(k): str(v) for k, v in (raw_headers or {}).items()} + url = str(resource) + scheme = urlsplit(url).scheme.lower() + if scheme not in _HTTP_PROTOCOLS: + msg = f"Refusing non-HTTP URL for remote download: {url!r}" + raise RemoteCacheError(msg) + request = Request(url, headers=headers) timeout = get_config().remote_download_timeout with ( - urlopen(request, timeout=timeout) as remote_fi, + urlopen(request, timeout=timeout) as remote_fi, # noqa: S310 tmp_path.open("wb") as local_fi, ):

🧰 Tools

🪛 Ruff (0.15.9)

[error] 140-140: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

[error] 143-143: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dascore/utils/remote_io.py` around lines 138 - 143, The code treats storage_options as flat HTTP headers and trusts the resource protocol; update the HTTP branch handling around _HTTP_PROTOCOLS, Request, and urlopen to (1) extract headers compatibly from getattr(resource, "storage_options", {}) by checking for a nested "headers" key first and falling back to the top-level mapping (ensuring a dict of header strings), and (2) explicitly parse and validate the URL scheme of str(resource) (e.g., via urllib.parse.urlparse) to allow only "http" or "https" before calling urlopen (use get_config().remote_download_timeout as the timeout). Ensure Request receives the validated URL and the constructed headers.

implement coordinate hash

1cfaa34

coderabbitai bot added bug Something isn't working ready_for_review PR is ready for review labels Apr 4, 2026

chatgpt-codex-connector bot reviewed Apr 4, 2026

View reviewed changes

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

address reviews

df55ff8

coderabbitai bot added IO Work for reading/writing different formats proc Related to processing module patch related to Patch class labels Apr 4, 2026

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

d-chambers removed the ready_for_review PR is ready for review label Apr 4, 2026

		payload = (coord.__class__, unit_str, hash_array(coord.values))
		return hash(payload)

		headers = dict(getattr(resource, "storage_options", {}) or {})
		request = Request(str(resource), headers=headers)

Conversation

d-chambers commented Apr 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

d-chambers commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

codecov bot commented Apr 4, 2026 •

edited

Loading