Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
9230643
CF Validator: add new module to find all current standard names
sadielbartholomew Jul 30, 2025
22783a8
Tidy CF Validation logic for retrieving standard names
sadielbartholomew Aug 4, 2025
0f2cd31
Add new function in netcdfread to validate standard names
sadielbartholomew Aug 4, 2025
d62f6c0
Cache retrieval of standard names from XML at URL
sadielbartholomew Aug 4, 2025
ba92613
Add detection of invalid standard names on coordinates upon read
sadielbartholomew Aug 4, 2025
ccdd8fc
Update standard names check for distinct conformance doc steps
sadielbartholomew Aug 5, 2025
634277e
Update netcdfread to validate standard names on bounds
sadielbartholomew Aug 5, 2025
6552be9
Update & document return status for _check_standard_name
sadielbartholomew Aug 5, 2025
ba30330
Pluralise function name to _check_standard_names
sadielbartholomew Aug 5, 2025
ce7b9d1
Prevent redundant argument specification in _check_standard_names
sadielbartholomew Aug 6, 2025
7ecdd04
Validation for standard names on geometry node coords
sadielbartholomew Aug 6, 2025
d4a6dde
Validation for standard names: geometry attrs & ancil vars
sadielbartholomew Aug 6, 2025
ae3aa48
Improve compliance check messages for _check_standard_names
sadielbartholomew Aug 6, 2025
42f83e3
Validation for standard names: aux, tie point & scalar coords
sadielbartholomew Aug 6, 2025
b897fa8
Validation for standard names: node coordinates
sadielbartholomew Aug 6, 2025
dc38eea
Update comments in netcdfread RE compliance checking
sadielbartholomew Aug 7, 2025
e007f1f
Validation for standard names: geometry-related variables
sadielbartholomew Aug 7, 2025
bc8d655
Generalise variable names in _check_standard_names
sadielbartholomew Aug 7, 2025
fc77de5
Validation for standard names: for coordinate interpolation
sadielbartholomew Aug 7, 2025
4ef5a5e
Clarify _check_standard_names dict querying for external variables
sadielbartholomew Aug 8, 2025
9cfb487
Validation for standard names: for quantization container vars
sadielbartholomew Aug 8, 2025
8f84e72
Validation for standard names: for UGRID mesh topology vars
sadielbartholomew Aug 8, 2025
0cf5b40
Validation for standard names: for UGRID location index set
sadielbartholomew Aug 8, 2025
0939181
netcdfread: fix to ref appropriate var in existing message
sadielbartholomew Aug 8, 2025
71e8a54
Validation for standard names: for UGRID mesh & connectivity
sadielbartholomew Aug 8, 2025
95791c5
Flesh out formal docstring for _check_standard_names
sadielbartholomew Aug 8, 2025
2ce6ada
Set up skeleton for new test for compliance checking
sadielbartholomew Aug 28, 2025
bccee9a
Update to finalise methods in test for compliance checking
sadielbartholomew Aug 28, 2025
7cd569e
Rename 'extract_names_from_xml' to mark as internal-use only
sadielbartholomew Aug 28, 2025
b60af35
Test 'get_all_current_standard_names' & add to cfdm namespace
sadielbartholomew Aug 28, 2025
ce556f0
Test '_extract_names_from_xml' & add to cfdm namespace
sadielbartholomew Aug 28, 2025
df94d93
Add alias name inclusion flag to 'get_all_current_standard_names'
sadielbartholomew Aug 28, 2025
3c2d74f
Update testing for 'get_all_current_standard_names' w/ URL access check
sadielbartholomew Aug 28, 2025
6ac0274
Set up helper func. for creating bad fields in test_compliance_checking
sadielbartholomew Aug 28, 2025
82764cf
Write tests for testing compliance of good/compliant fields
sadielbartholomew Aug 28, 2025
7568e30
Populate docstring of both functions in cfvalidation
sadielbartholomew Aug 29, 2025
4647d31
Test compliance checking: add function to create file w/ bad names
sadielbartholomew Sep 1, 2025
be342fc
Test compliance checking: test non-compliant non-UGRID field
sadielbartholomew Sep 1, 2025
79dfcbb
Add message in test assertion to clarify failure case
sadielbartholomew Sep 1, 2025
9c952eb
Mark standard name compliance tests which are currently failing
sadielbartholomew Sep 1, 2025
686d1fb
Update create_test_files to write UGRID file w/ invalid names
sadielbartholomew Sep 2, 2025
0fd56ad
Add notes to testing from SLB-DCH catchup
sadielbartholomew Sep 2, 2025
c31b4d4
Update attribute key in dataset_compliance for bad standard names
sadielbartholomew Sep 2, 2025
5711f89
Create test for compliance checking on UGRID field
sadielbartholomew Sep 3, 2025
eaf0e11
Update compliance checking test to use gen'd UGRID bad name fields
sadielbartholomew Sep 4, 2025
330846e
Make var names consistent in test_compliance_checking on UGRID
sadielbartholomew Sep 4, 2025
d9f00bd
Update compliance checking test to mark UGRID present failures
sadielbartholomew Sep 5, 2025
74a01ca
Tidy including removing deprecated TODOs
sadielbartholomew Oct 3, 2025
44af20c
Make string-type checking include NumPy string types
sadielbartholomew Oct 6, 2025
fb6c66c
Prevent duplicate dict in dataset_compliance for UGRID fields
sadielbartholomew Oct 6, 2025
4b7d61a
Fix bug in netcfread._include_component_report causing bad entry
sadielbartholomew Oct 7, 2025
11de207
Improve naming in _add_message to clarify netCDF variable parentage
sadielbartholomew Oct 8, 2025
dbbd929
Update arg. naming in _check_standard_names to mirror _add_message
sadielbartholomew Oct 10, 2025
cc04b7d
Update structure of dataset_compliance to include attrs as keys
sadielbartholomew Oct 30, 2025
3d1a2fc
Update further data compliance structure to add mesh level
sadielbartholomew Oct 31, 2025
3afa2a8
Change data compliance structure to have attributes as keys
sadielbartholomew Nov 3, 2025
218235a
Change data compliance structure to store nested code & values
sadielbartholomew Nov 3, 2025
7e3e70a
Update further data compliance structure to nest UGRI mesh info
sadielbartholomew Nov 3, 2025
60f9084
Allow passing of attrs & dims into _include_component_report
sadielbartholomew Nov 3, 2025
a4ebf66
Get UGRID mesh checking non-compliance output as desired
sadielbartholomew Nov 3, 2025
7fdfd80
Update further data compliance structure to store multiple reasons
sadielbartholomew Nov 4, 2025
03559fc
Change key name in dataset_compliance to 'attributes' for clarity
sadielbartholomew Nov 4, 2025
5ca8a8b
Updates to get attributes as list in dataset_compliance output
sadielbartholomew Nov 10, 2025
f23734a
Add basis of new-form dataset_compliance test structures
sadielbartholomew Nov 26, 2025
dd68b58
Simplify by removing some investigate/dev lines for old structure output
sadielbartholomew Dec 17, 2025
43afa11
Update dicts for forming expected outputs in compliance-checking test
sadielbartholomew Dec 17, 2025
48bf0d1
Prevent rogue 'None' key from emerging in dataset_compliance output
sadielbartholomew Dec 17, 2025
b4985d5
Register attibute names in dataset_compliance output
sadielbartholomew Dec 17, 2025
2efde4e
Update dataset_compliance output for nested netCDF component form
sadielbartholomew Dec 17, 2025
b2ce824
Update dataset_compliance output to have dict of dims
sadielbartholomew Dec 17, 2025
bc05df8
Update dataset_compliance output to register dimension sizes
sadielbartholomew Dec 17, 2025
ffccd71
Update dataset_compliance to add child attributes to output structure
sadielbartholomew Dec 18, 2025
66f8ca3
Update dataset_compliance to end with list of code, reason & value
sadielbartholomew Dec 18, 2025
5fc0f97
Update dataset_compliance to improve & tidy parent compliance processing
sadielbartholomew Dec 18, 2025
6baaa38
Investigate/dev logic towards final new structure output
sadielbartholomew Dec 18, 2025
c2a35db
Update dataset_compliance to cater for parent-less ncvar case
sadielbartholomew Dec 18, 2025
794d810
Fix top-level attribute issue emerging in dataset_compliance output
sadielbartholomew Dec 18, 2025
36845c4
Tidying of netcdfread after dataset_compliance update work
sadielbartholomew Dec 19, 2025
52ad505
Formatting & tidying of netcdfread module
sadielbartholomew Dec 19, 2025
f7cda1d
Fix for dataset_compliance output recording of dimension size
sadielbartholomew Dec 19, 2025
ce1b88a
Tidying of PR before logic consolidation
sadielbartholomew Dec 19, 2025
f3ebf46
Begin consolidating _add_message & _include_component_report
sadielbartholomew Dec 19, 2025
c741b2b
Further consolidation of _add_message & _include_component_report
sadielbartholomew Dec 19, 2025
e8dba0a
Consolidation: include dims processing in _update_noncompliance_dict
sadielbartholomew Dec 19, 2025
df9b971
Remove now-unnecessary conditional with top_ancestor_ncvar
sadielbartholomew Dec 19, 2025
34d724c
Final tidy of netcdfread module, prepare test_compliance_checking
sadielbartholomew Dec 19, 2025
22b8ac5
Document new keyword noncompliance_report for cfdm.read
sadielbartholomew Dec 19, 2025
f24e7bc
Update docs summary of new keyword noncompliance_report for cfdm.read
sadielbartholomew Dec 19, 2025
07db6bf
Implement new keyword noncompliance_report for cfdm.read
sadielbartholomew Dec 19, 2025
59cbbf4
Compliance checking: update UGRID unit test for new output structure
sadielbartholomew Dec 19, 2025
88b46e9
Compliance checking: cover second field in UGRID unit test
sadielbartholomew Dec 19, 2025
d55e70e
Compliance checking: cover third & final field in UGRID unit test
sadielbartholomew Dec 19, 2025
8e4ae98
Remove now-redundant tests in test_compliance_checking
sadielbartholomew Dec 19, 2025
37eb665
Compliance checking: update tests by defining new class constants
sadielbartholomew Dec 19, 2025
73e8d2c
Include missing component inclusion & tidy placeholder/dev comments
sadielbartholomew Dec 19, 2025
1467e6c
Include missing component report to fix non-UGRID output + tidy
sadielbartholomew Jan 5, 2026
100846d
Add TODO note to netcdfread module for investigation
sadielbartholomew Jan 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion cfdm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@

from .constants import masked

# Internal ones passed on so they can be used in cf-python (see
# Note internal ones here are passed on so they can be used in cf-python (see
# comment below)
from .functions import (
ATOL,
Expand Down Expand Up @@ -249,6 +249,12 @@
_display_or_return,
)

from .cfvalidation import (
get_all_current_standard_names,
_extract_names_from_xml,
_STD_NAME_CURRENT_XML_URL
)

from .constructs import Constructs

from .data import (
Expand Down
114 changes: 114 additions & 0 deletions cfdm/cfvalidation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
import logging
import os
import pprint
import re

from functools import lru_cache

# Prefer using built-in urllib to extract XML from cf-convention.github.io repo
# over the 'github' module to use the GitHub API directly, because it avoids
# the need for another dependency to the CF Data Tools.
from urllib import request

# To parse the XML - better than using manual regex parsing!
import xml.etree.ElementTree as ET

logger = logging.getLogger(__name__)


# This is the data at the repo location:
# 'github.com/cf-convention/cf-convention.github.io/blob/main/Data/'
# 'cf-standard-names/current/src/cf-standard-name-table.xml' but use this
# form under 'https://raw.githubusercontent.com/' for raw XML content only.
# Note: the raw XML is also made available at:
# 'cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-'
# 'table.xml', is that a better location to grab from (may be more stable)?
_STD_NAME_CURRENT_XML_URL = (
"https://raw.githubusercontent.com/"
"cf-convention/cf-convention.github.io/refs/heads/main/Data/"
"cf-standard-names/current/src/cf-standard-name-table.xml"
)


def _extract_names_from_xml(snames_xml, include_aliases):
"""Extract standard names from a valid Standard Name Table XML document.

Whether or not to include registered aliases is dependent on the value
of the `include_aliases` flag.

.. versionadded:: NEXTVERSION

:Parameters:

snames_xml: `bytes`
Bytes representing an XML file of any
valid Standard Name Table XML document, or mocked-up
equivalent form. 'entry id' items are extracted, along
with 'alias id' items if requested.

include_aliases: `bool`
If `True`, include standard names that are aliases
rather than strict entries of the input table. By
default this is `False` so that aliases are excluded.

:Returns:

`list`
A list of all CF Conventions standard names in the
given version of the table, including aliases if
requested.

"""
root = ET.fromstring(snames_xml)
# Want all <entry id="..."> elements. Note the regex this corresponds
# to, from SLB older code, is 're.compile(r"<entry id=\"(.+)\">")' but
# using the ElementTree is a much more robust means to extract
all_standard_names = [
entry.attrib["id"] for entry in root.findall(".//entry")
]
if include_aliases:
all_standard_names += [
entry.attrib["id"] for entry in root.findall(".//alias")
]

return all_standard_names


@lru_cache
def get_all_current_standard_names(include_aliases=False):
"""Get a list of all CF Standard Names from the current version table.

Entries are always returned from the current table. By default aliases
are not included in the output but can also be included by setting the
`include_aliases` flag to `True`.

.. versionadded:: NEXTVERSION

:Parameters:

include_aliases: `bool`, optional
If `True`, include standard names that are aliases
rather than strict entries of the current table. By
default this is `False` so that aliases are excluded.

:Returns:

`list`
A list of all CF Conventions standard names in the
current version of the table, including aliases if
requested.

"""
logger.info(
"Retrieving XML for set of current standard names from: ",
_STD_NAME_CURRENT_XML_URL
) # pragma: no cover
with request.urlopen(_STD_NAME_CURRENT_XML_URL) as response:
all_snames_xml = response.read()

logger.debug(
f"Successfully retrieved list of {len(all_snames_xml)} standard names"
) # pragma: no cover

return _extract_names_from_xml(
all_snames_xml, include_aliases=include_aliases)
Loading