Enable specification of local archive without public access#5
Conversation
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
There was a problem hiding this comment.
Pull request overview
This pull request adds support for non-public local radar data archives, specifically enabling access to the MeteoSwiss (MCH) radar network data stored in zip archives. The PR also includes several bug fixes and updates to support pandas 2.x frequency aliases and corrects the NEXRAD S3 bucket reference.
Changes:
- Adds support for two MCH radar networks (MCH_LTE and MCH_CSCS) with local-only access and zip file handling
- Introduces
only_publicparameter to filter networks/radars by public data availability - Updates pandas frequency aliases from deprecated "Y" to "YE" (Year End)
- Fixes coordinate handling in xradar utilities to prevent KeyError when coordinates are missing
- Corrects NEXRAD S3 bucket from "noaa-nexrad-level2" to "unidata-nexrad-level2"
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| radar_api/utils/xradar.py | Adds coordinate key checking and new MCH-specific datatree conversion functions |
| radar_api/search.py | Adds zip file support and updates frequency handling from "Y" to "YE" |
| radar_api/readers.py | Improves error handling with null checks and better error messages |
| radar_api/io.py | Adds only_public parameter (not yet implemented) to filter private networks |
| radar_api/filter.py | Fixes boundary condition bug in time filtering (end_time inclusive) |
| radar_api/checks.py | Uses only_public=False to access all networks including private ones |
| radar_api/tests/test_*.py | Updates tests for frequency changes, S3 bucket name, and error handling |
| radar_api/etc/network/*.yaml | Adds MCH configurations, updates NEXRAD bucket, adds public_data flags |
| radar_api/etc/radar/MCH_/.yaml | Adds radar configuration files for 5 MCH radars |
| pyproject.toml | Drops Python 3.10 support, adds xarray/xradar to dev dependencies |
Comments suppressed due to low confidence (1)
radar_api/io.py:70
- The
only_publicparameter is added but not actually used to filter networks. The function should check thepublic_datafield in each network's configuration and filter accordingly. Consider adding logic like:
if only_public:
networks = [n for n in networks if get_network_info(n).get("public_data", True)]Without this implementation, the parameter has no effect and private networks (like MCH_LTE and MCH_CSCS with public_data: False) will still be returned when only_public=True.
def available_networks(only_public=True):
"""Get list of available networks."""
network_config_path = get_network_config_path()
networks_config_filenames = os.listdir(network_config_path)
networks = [fname.split(".")[0] for fname in networks_config_filenames]
return sorted(networks)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return dt | ||
|
|
||
|
|
||
| def _get_sweep_dataset_mch(radar_obj, sweep): |
There was a problem hiding this comment.
The function _get_sweep_dataset_mch lacks a docstring. While other similar functions in this file like get_nexrad_datatree_from_pyart have docstrings explaining their purpose, this function has none. Adding a docstring would improve code documentation and help other developers understand the purpose and usage of this MCH-specific function.
| def _get_sweep_dataset_mch(radar_obj, sweep): | |
| def _get_sweep_dataset_mch(radar_obj, sweep): | |
| """Create an xarray.Dataset for a single MCH radar sweep. | |
| This helper is tailored to MCH radar volumes where all data are stored | |
| in the first sweep (index 0). It reads the radar fields and associated | |
| coordinates from sweep 0, but annotates the resulting dataset with the | |
| provided ``sweep`` number in the ``sweep_number`` coordinate. | |
| Parameters | |
| ---------- | |
| radar_obj : | |
| Py-ART Radar object containing MCH radar volume data. | |
| sweep : int | |
| Logical sweep number to assign in the output dataset metadata. | |
| Returns | |
| ------- | |
| xarray.Dataset | |
| Dataset containing the radar fields and coordinates for the MCH sweep. | |
| """ |
|
|
||
|
|
||
| def _list_files_within_zip(zip_filepath): | ||
| """Return the paths of files within a zip file.""" |
There was a problem hiding this comment.
The function _list_files_within_zip lacks a docstring beyond the single-line description. Given that other functions in this file have more detailed docstrings, consider expanding this to include information about parameters and return values.
| """Return the paths of files within a zip file.""" | |
| """Return the paths of files contained within a ZIP archive. | |
| Parameters | |
| ---------- | |
| zip_filepath : str | |
| Path to the ZIP file on the local filesystem. | |
| Returns | |
| ------- | |
| list of str | |
| List of file paths constructed by joining ``zip_filepath`` with each | |
| member name inside the archive. | |
| """ |
| arr = _get_field_array(radar_obj, sweep=0, field_name=field_name) | ||
| if "coordinates" in radar_obj.fields[field_name]: | ||
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | ||
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | ||
| ds = xr.Dataset(dict_da) | ||
| # Add coords | ||
| coords_dict = { | ||
| "azimuth": ("azimuth", radar_obj.get_azimuth(0)), | ||
| "elevation": ("azimuth", radar_obj.get_elevation(0)), | ||
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | ||
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(0)]), | ||
| } | ||
| # Add other coordinates | ||
| coords_dict.update(_get_radar_location(radar_obj)) | ||
| coords_dict["sweep_number"] = sweep | ||
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][0] | ||
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][0] |
There was a problem hiding this comment.
Similar to line 108, the azimuth and elevation are hardcoded to use sweep 0 instead of the sweep parameter. These should use the actual sweep value to correctly retrieve data for each sweep number.
| arr = _get_field_array(radar_obj, sweep=0, field_name=field_name) | |
| if "coordinates" in radar_obj.fields[field_name]: | |
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | |
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | |
| ds = xr.Dataset(dict_da) | |
| # Add coords | |
| coords_dict = { | |
| "azimuth": ("azimuth", radar_obj.get_azimuth(0)), | |
| "elevation": ("azimuth", radar_obj.get_elevation(0)), | |
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | |
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(0)]), | |
| } | |
| # Add other coordinates | |
| coords_dict.update(_get_radar_location(radar_obj)) | |
| coords_dict["sweep_number"] = sweep | |
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][0] | |
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][0] | |
| arr = _get_field_array(radar_obj, sweep=sweep, field_name=field_name) | |
| if "coordinates" in radar_obj.fields[field_name]: | |
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | |
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | |
| ds = xr.Dataset(dict_da) | |
| # Add coords | |
| coords_dict = { | |
| "azimuth": ("azimuth", radar_obj.get_azimuth(sweep)), | |
| "elevation": ("azimuth", radar_obj.get_elevation(sweep)), | |
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | |
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(sweep)]), | |
| } | |
| # Add other coordinates | |
| coords_dict.update(_get_radar_location(radar_obj)) | |
| coords_dict["sweep_number"] = sweep | |
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][sweep] | |
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][sweep] |
| arr = _get_field_array(radar_obj, sweep=0, field_name=field_name) | ||
| if "coordinates" in radar_obj.fields[field_name]: | ||
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | ||
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | ||
| ds = xr.Dataset(dict_da) | ||
| # Add coords | ||
| coords_dict = { | ||
| "azimuth": ("azimuth", radar_obj.get_azimuth(0)), | ||
| "elevation": ("azimuth", radar_obj.get_elevation(0)), | ||
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | ||
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(0)]), | ||
| } | ||
| # Add other coordinates | ||
| coords_dict.update(_get_radar_location(radar_obj)) | ||
| coords_dict["sweep_number"] = sweep | ||
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][0] | ||
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][0] |
There was a problem hiding this comment.
The sweep mode and fixed angle are also hardcoded to use index 0 instead of the sweep parameter. These should use the actual sweep value for consistency.
| arr = _get_field_array(radar_obj, sweep=0, field_name=field_name) | |
| if "coordinates" in radar_obj.fields[field_name]: | |
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | |
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | |
| ds = xr.Dataset(dict_da) | |
| # Add coords | |
| coords_dict = { | |
| "azimuth": ("azimuth", radar_obj.get_azimuth(0)), | |
| "elevation": ("azimuth", radar_obj.get_elevation(0)), | |
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | |
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(0)]), | |
| } | |
| # Add other coordinates | |
| coords_dict.update(_get_radar_location(radar_obj)) | |
| coords_dict["sweep_number"] = sweep | |
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][0] | |
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][0] | |
| arr = _get_field_array(radar_obj, sweep=sweep, field_name=field_name) | |
| if "coordinates" in radar_obj.fields[field_name]: | |
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | |
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | |
| ds = xr.Dataset(dict_da) | |
| # Add coords | |
| coords_dict = { | |
| "azimuth": ("azimuth", radar_obj.get_azimuth(sweep)), | |
| "elevation": ("azimuth", radar_obj.get_elevation(sweep)), | |
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | |
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(sweep)]), | |
| } | |
| # Add other coordinates | |
| coords_dict.update(_get_radar_location(radar_obj)) | |
| coords_dict["sweep_number"] = sweep | |
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][sweep] | |
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][sweep] |
| description: MeteoSwiss Rad4Alp network | ||
| public_data: False | ||
| cloud_directory_pattern: null | ||
| local_directory_pattern: "/store_new/mch/msrad/radar/swiss/data/{time:%Y}/{time:%y}{time:%j}/{radar:3s}{time:%y}{time:%j}.zip" |
There was a problem hiding this comment.
The local_directory_pattern includes an absolute path /store_new/mch/msrad/radar/swiss/data/... which is hardcoded and specific to a particular system. This violates the pattern of other networks (like NEXRAD, IDEAM, FMI) which use {base_dir} as a prefix. This means users cannot specify their own base directory for MCH_CSCS data, limiting portability and flexibility. Consider using a pattern like "{base_dir}/MCH/{time:%Y}/{time:%y}{time:%j}/{radar:3s}{time:%y}{time:%j}.zip" to align with other networks.
| local_directory_pattern: "/store_new/mch/msrad/radar/swiss/data/{time:%Y}/{time:%y}{time:%j}/{radar:3s}{time:%y}{time:%j}.zip" | |
| local_directory_pattern: "{base_dir}/MCH/{time:%Y}/{time:%y}{time:%j}/{radar:3s}{time:%y}{time:%j}.zip" |
| def _list_files_within_zip(zip_filepath): | ||
| """Return the paths of files within a zip file.""" | ||
| with zipfile.ZipFile(zip_filepath, "r") as zf: | ||
| filenames = zf.namelist() | ||
| filepaths = [os.path.join(zip_filepath, fname) for fname in filenames] | ||
| return filepaths |
There was a problem hiding this comment.
The new function _list_files_within_zip lacks test coverage. Given that this codebase has comprehensive test coverage for search functionality (test_search.py has 256 lines), tests should be added to verify this zip file handling works correctly, especially edge cases like invalid zip files or zip files with nested directories.
| "tqdm", | ||
| ] | ||
| requires-python = ">=3.10" | ||
| requires-python = ">=3.11" |
There was a problem hiding this comment.
The minimum Python version has been changed from 3.10 to 3.11, which is a breaking change for users on Python 3.10. The PR description doesn't mention this or explain why Python 3.10 support is being dropped. If there's a specific reason (e.g., using Python 3.11+ features, dependency requirements), it should be documented in the PR description or a migration guide.
| def _try_list_files(fs, dir_path): | ||
| """Return filepaths within a given directory (or zip file).""" | ||
| try: | ||
| fpaths = fs.ls(dir_path) | ||
| if not dir_path.endswith(".zip"): | ||
| return fs.ls(dir_path) | ||
| return _list_files_within_zip(dir_path) | ||
| except Exception: | ||
| fpaths = [] | ||
| return fpaths | ||
| return [] |
There was a problem hiding this comment.
The zip file handling swallows all exceptions with a bare except Exception: clause. While this ensures the function doesn't crash, it makes debugging difficult when zip files are corrupted or inaccessible. Consider logging the exception or at least catching more specific exceptions like zipfile.BadZipFile or FileNotFoundError so that issues can be diagnosed.
| @@ -0,0 +1,11 @@ | |||
| network: MCH | |||
There was a problem hiding this comment.
The network field is set to "MCH" but the filename is "MCH_CSCS.yaml". This inconsistency could cause issues when referencing the network by name. The network field should match the filename (i.e., should be "MCH_CSCS" instead of "MCH").
| network: MCH | |
| network: MCH_CSCS |
radar_api/utils/xradar.py
Outdated
| def _get_sweep_dataset_mch(radar_obj, sweep): | ||
| dict_da = {} | ||
| fields = list(radar_obj.fields) | ||
| for field_name in fields: | ||
| arr = _get_field_array(radar_obj, sweep=0, field_name=field_name) | ||
| if "coordinates" in radar_obj.fields[field_name]: | ||
| dims = radar_obj.fields[field_name]["coordinates"].split(" ")[1:] | ||
| dict_da[field_name] = xr.DataArray(arr, dims=dims) | ||
| ds = xr.Dataset(dict_da) | ||
| # Add coords | ||
| coords_dict = { | ||
| "azimuth": ("azimuth", radar_obj.get_azimuth(0)), | ||
| "elevation": ("azimuth", radar_obj.get_elevation(0)), | ||
| "range": ("range", radar_obj.range["data"][: ds.sizes["range"]]), | ||
| "time": ("azimuth", radar_obj.time["data"][radar_obj.get_slice(0)]), | ||
| } | ||
| # Add other coordinates | ||
| coords_dict.update(_get_radar_location(radar_obj)) | ||
| coords_dict["sweep_number"] = sweep | ||
| coords_dict["sweep_mode"] = radar_obj.sweep_mode["data"][0] | ||
| coords_dict["sweep_fixed_angle"] = radar_obj.fixed_angle["data"][0] | ||
|
|
||
| ds = ds.assign_coords(coords_dict) | ||
| ds["time"].attrs["units"] = radar_obj.time["units"] | ||
|
|
||
| # Decode time | ||
| ds = xr.decode_cf(ds, decode_times=True) | ||
| return ds | ||
|
|
||
|
|
||
| def get_mch_datatree_from_pyart(radar_obj): | ||
| """Convert a pyart object to xradar datatree.""" | ||
| # Define renaming dictionary to CF-Radials2 | ||
| # --> https://github.com/openradar/xradar/blob/830d86b1c6290f1dce0e73c60a1d3b819735f906/xradar/model.py#L385 | ||
| # --> Currently set same range for all sweeps ! | ||
| # --> Currently do not copy metadata and variable attributes ! | ||
| dict_var_naming = { | ||
| "reflectivity": "DBZH", | ||
| "differential_reflectivity": "ZDR", | ||
| "uncorrected_cross_correlation_ratio": "RHOHV", | ||
| "uncorrected_differential_phase": "PHIDP", | ||
| "spectrum_width": "WRADH", | ||
| "velocity": "VRADH", | ||
| # reflectivity_hh_clut | ||
| # reflectivity_vv | ||
| # signal_to_noise_ratio | ||
| } | ||
| dict_ds = {} | ||
| for sweep in radar_obj.sweep_number["data"]: | ||
| sweep_name = f"sweep_{sweep}" | ||
| ds = _get_sweep_dataset_mch(radar_obj, sweep=sweep) | ||
| rename_dict = {k: v for k, v in dict_var_naming.items() if k in ds} | ||
| dict_ds[sweep_name] = ds.rename(rename_dict) | ||
| dt = xr.DataTree.from_dict(dict_ds) | ||
| # Add geolocation | ||
| for coord, value in _get_radar_location(radar_obj).items(): | ||
| dt[coord] = value | ||
|
|
||
| return dt No newline at end of file |
There was a problem hiding this comment.
The new functions _get_sweep_dataset_mch and get_mch_datatree_from_pyart lack test coverage. Given that this codebase has comprehensive test coverage for other utilities and functions, tests should be added for these new MCH-specific functions to ensure they work correctly.
Prework
What kind of change does this PR introduce? (check at least one)
Does this PR introduce a breaking change? (check one)
If yes, please describe the impact and communicate accordingly:
The PR fulfills these requirements:
bugfix-<some_key>-<word>doc-<some_key>-<word>tutorial-<some_key>-<word>feature-<some_key>-<word>refactor-<some_key>-<word>optimize-<some_key>-<word>fix #xxx[,#xxx], where "xxx" is the issue number)If adding a new feature, the PR's description includes:
Other information:
Related GitHub issues and pull requests
Summary
Please explain the purpose and scope of your contribution.