Skip to content

Full NVS support#575

Open
gmaze wants to merge 79 commits intomasterfrom
referencing
Open

Full NVS support#575
gmaze wants to merge 79 commits intomasterfrom
referencing

Conversation

@gmaze
Copy link
Member

@gmaze gmaze commented Jan 25, 2026

This PR aims to bring a more robust and exhaustive support for meta-data of the Argo vocabulary referencing system to Argopy.

The Argo vocabulary referencing system is a collection of reference tables, values and relationships used to constrain how Argo parameters can be filled in netcdf files. This referencing system is managed by the Argo Vocabulary Task Team (AVTT) and powered by the web-API of the NERC Vocabulary Server (NVS).

This PR provides:

  • a low-level full support to access NVS data (with online and offline implementations)
  • high-level APIs, mostly free of NVS jargon to read/manipulate Argo references

Binder

Status of NVS support as of Argopy v1.4.0

Argopy already provides support for tables retrieval and basic search in table names with the ArgoNVSReferenceTables class:

from argopy import ArgoNVSReferenceTables
ArgoNVSReferenceTables().all_tbl_name
ArgoNVSReferenceTables().search('sensor', where='title') # or 'description'
df = ArgoNVSReferenceTables().tbl('R27')

Limitations

However, this support is:
  • partial, eg: no access to tables or rows meta-data, or to all attributes of a reference value,
  • lacks a more comprehensive approach to the referencing system, i.e. based on Argo parameter names instead of the referencing jargon ('SENSOR' vs 'R27'),
  • lacks complementary meta-data from the AVTT (e.g. possible mapping of one table entries onto another).
  • not robust, as it requires an internet connection to access the NVS web-API.

New features objectives

This PR will bring:
  • comprehensive support, i.e. free of NVS jargon
  • improved access/search/export methods for Argo Reference Tables, Values and Mappings
  • access to all meta-data for tables and values available from NVS
  • support for online and offline access to any reference
  • specific classes for Argo Reference Tables (aka NVS Vocabulary), Values (aka NVS Concept) and Mappings, providing all possible information for each
  • support for mappings used to inform relationship between concepts
  • internal separation of concerns between the NVS web-API and the Argo Reference API facades
  • provide augmented information wherever possible (e.g. in R03 all concepts definition attribute have a dictionary like Local_Attributes and Properties string that can be interpreted automatically to provide a dictionary like access to items)

We can try to define the targeted audience as:

  • regular end-users of argopy, digging into some dataset content,
  • Argo operators and experts, content-focused, not interested by http requests and web-API specifics management,
  • argopy itself, for internal data referencing and attributes filling.

New APIs Documentation

Before writing down a full documentation, we need feedbacks from the AVTT, to settle down on the new API facades.

  • Poke the AVTT team here
  • Raise an issue on AVTT repo

Once the API facade will get satisfactory feedbacks, we'll prepare complementary documentation sections for the AVTT README and ADMT documentation page.

In the mean time, here is a brut docstring copy/paste to illustrate what's the current features or use-cases.

Doc for ArgoReferenceTable

Rq: I hesitated to call this class ArgoReferenceVocabulary or ArgoVocabulary. But since the Argo user's manual uses "Reference Table", I opted to keep this approach and do not stick too much to the NVS jargon (which is still visible in the internal machinery though).

The ArgoReferenceTable instance holds all the Argo referencing system information:

  • the comprehensive logic (use SENSOR in place of R25),
  • the table meta-data are in read-only attributes, e.g. art.description and art.version, art.date, etc...,
  • the list of table values/rows, which are accessible through label-based indexing, e.g. art['CTD_TEMP_CNDC']; this means that the instance can be used almost like a dictionary (e.g. for inclusion assertion and iteration),
  • values export/search methods, e.g. art.search(), art.to_dataframe(), art.to_dict()

Creation

        from argopy import ArgoReferenceTable

        # Use an Argo parameter name, documented by one of the Argo reference tables:
        art = ArgoReferenceTable('SENSOR')

        # or a reference table identifier:
        art = ArgoReferenceTable('R25')

        # or a URN:
        art = ArgoReferenceTable.from_urn('SDN:R25::CTD_TEMP')

Attributes

        # All possible attributes are listed in:
        art.attrs

        # Reference Table attributes:
        art.parameter   # Name of the netcdf dataset parameter filled with values from this table
        art.identifier  # Reference Table ID
        art.description # [nvs['@graph']['@type']=='skos:Collection']["dc:description"]
        art.uri         # [nvs['@graph']['@type']=='skos:Collection']["@id"]
        art.version     # [nvs['@graph']['@type']=='skos:Collection']['owl:versionInfo']
        art.date        # [nvs['@graph']['@type']=='skos:Collection']['dc:date']

        # Raw NVS json data:
        art.nvs

Indexing and values

        # Values (or concept) within this reference table:
        len(art)     # Number of reference values
        art.keys()   # List of reference values name
        art.values() # List of :class:`ArgoReferenceValue`

        # Check for values:
        'CTD_TEMP_CNDC' in art  # Return True

        # Index by value key, like a simple dictionary:
        art['CTD_TEMP_CNDC']  # Return a :class:`ArgoReferenceValue` instance

        # Allows to iterate over all values/concepts:
        for concept in art:
        	print(concept.name, concept.urn)

Export methods

        # Export table values to a pd.DataFrame:
        art.to_dataframe()
        art.to_dataframe(columns=['name', 'deprecated'])  # Select value attributes to export in columns

        # Export table attributes to a dictionary (not the values !):
        art.to_dict()
        art.to_dict(keys=['parameter', 'date', 'uri'])  # Select Table attributes to export in dictionary keys

Search method

        # Search methods (return a list of :class:`ArgoReferenceValue` with match):
        # Any of the :class:`ArgoReferenceValue` attribute can be searched
        art.search(name='RAMSES')         # Search in values name
        art.search(definition='imaging')  # Search in values definition
        art.search(long_name='TriOS')     # Search in values long name

        # Possible change to output format:
        art.search(deprecated=True, output='df')  # To a :class:`pd.DataFrame`

Doc for ArgoReferenceValue

Rq: I hesitated to call this class ArgoReferenceConcept or ArgoVocabularyConcept. But since the Argo user's manual uses "Reference Table", I opted to keep this approach and do not stick too much to the NVS jargon (which is still visible in the internal machinery though).

The ArgoReferenceValue instance holds all the Argo referencing system information:

  • the comprehensive logic (e.g. the reference table is automatically determined when possible, hints are return otherwise),
  • the value meta-data are in read-only attributes, e.g. arv.definition and arv.version, arv.deprecated, etc...,
  • value meta-data export methods, e.g. arv.to_dict(), arv.to_json().

Creation

        from argopy import ArgoReferenceValue

        # One possible value for the Argo parameter 'SENSOR_MODEL':
        arv = ArgoReferenceValue('AANDERAA_OPTODE_3835')

        # For ambiguous value seen in more than one Reference Table
        arv = ArgoReferenceValue('4', reference='RT_QC_FLAG')
        arv = ArgoReferenceValue('4', reference='RR2')

        # From NVS/URN jargon:
        arv = ArgoReferenceValue.from_urn('SDN:R27::AANDERAA_OPTODE_3835')

Attributes

        arv = ArgoReferenceValue('BBP700')

        # All possible attributes are listed in:
        arv.attrs

        # Reference Value attributes:
        arv.name       # Term-id of the URN, eg 'BBP700'
        arv.long_name  # nvs["skos:prefLabel"]["@value"]
        arv.definition # nvs["skos:definition"]["@value"]
        arv.deprecated # nvs["owl:deprecated"]
        arv.parameter  # The netcdf parameter this concept applies to (eg 'SENSOR_MODEL')
        arv.reference  # The reference table this concept belongs to, can be used on a ArgoReferenceTable (eg 'R27')

        # Other reference Value attributes (more technical):
        arv.version    # nvs["owl:versionInfo"]
        arv.date       # nvs["dc:date"]
        arv.uri        # nvs["@id"]
        arv.urn        # nvs["skos:notation"]

        # Relationships with other Reference Values or Context:
        arv.broader    # nvs["skos:broader"]
        arv.narrower   # nvs["skos:narrower"]
        arv.related    # nvs["skos:related"]
        arv.sameas     # nvs["owl:sameAs"]
        arv.context    # nvs["@context"]

        # Extra attributes for R03, R14, R18 values (content curated from the value definition string, see e.g. below)
        arv.extra

        # Raw NVS json data:
        arv.nvs

Extra attributes (only for values from R03, R14, R18)

        # For Values from R03 table
        arv = ArgoReferenceValue('BBP470')
        arv.extra
        arv.extra['Local_Attributes'].long_name
        arv.extra['Properties'].category

        # For Values from R14 table
        arv = ArgoReferenceValue('T000015')
        arv.extra
        arv.extra['Template_Values'].unit

        # For Values from R18 table
        arv = ArgoReferenceValue('CB00001')
        arv.extra
        arv.extra['Template_Values'].short_sensor_name

Export methods

        # Export to a dictionary:
        arv.to_dict()
        arv.to_dict(keys=['name', 'deprecated'])  # Select attributes to export in dictionary keys

        # Export to json structure:
        arv.to_json()  # In memory
        arv.to_json('reference_value.json')  # To a json file
        arv.to_json('reference_value.json', keys=['name', 'deprecated'])  # Select attributes to export

Doc for ArgoReferenceMapping

A mapping is a list of relationships between 1 concept (the 'subject') and others (the 'object'). Subjects are grouped by reference tables, as well as subjects. So that there will be one mapping between one ArgoReferenceTable and another.

Relationships between concepts can be:

  • "narrower/broader" when there is a hierarchy between the subject and the object
  • "related" or "sameas" when the subject is related to the object without strict hierarchy

In this high-level API I opted to remove the NVS jargon of 'skos' and 'owl', to simplify relationships understanding. But in order to avoid too much confusion for the AVTT, I also opted to keep "predicates" to refer to relationships. May be this is not the best choice and 'relations' could be preferred over 'predicates'.

List of known mappings:

Subject Predicate Object Dedicated Argopy support
R08 Argo instrument types (eg: '878') broader R23 Argo platform type (eg: 'ARVOR')
R24 Argo platform maker (eg: 'MRV') related R23 Argo platform type (eg: 'SOLO_D_MRV')
RMC Argo measurement code categories (eg: 'RSPEC') narrower R15 Argo trajectory measurement code identifiers (eg: '901')
RTV Argo float cycle timing variables (eg: 'DET') related or sameas R15 Argo trajectory measurement code identifiers (eg: '901')
R25 Argo sensor types (eg: 'RADIOMETER_DOWN_IRR380') related R27 Argo sensor models (eg: 'SATLANTIC_OCR504_ICSW') ✅ in the ArgoSensor sub-module, but still in dev.
R26 Argo sensor manufacturers (eg: 'SBE') narrower R27 Argo sensor models (eg: 'SEAFET')

Eg:

  • R23/APEX is a broader concept than R08/847
  • R27/SEAFET is a narrower concepth than R26/SBE
  • R27/AANDERAA_OPTODE_3830 is related to R25/OPTODE_DOXY
  • R27/AANDERAA_OPTODE_3830 is a broader concept than R26/AANDERAA

More details from the AVTT documentation:
https://github.com/OneArgo/ArgoVocabs?tab=readme-ov-file#ivb-mappings

Mappings are used to inform relationship between concepts. For instance, inform all the sensor_models manufactured by one sensor_maker, or all the platform_types manufactures by one platform_maker, etc.
They are used by the FileChecker to ensure the consistency between these metadata fields in the Argo dataset.

Creation

        from argopy import ArgoReferenceMapping

        # Use two Argo parameter names, documented by one of the Argo reference tables:
        ArgoReferenceMapping('PLATFORM_MAKER', 'PLATFORM_TYPE')

        # or reference table identifiers:
        arm = ArgoReferenceMapping('R24', 'R23')

Indexing and values

        # Relationships within this reference mapping:
        len(arm)     # Number of relationships
        arm.subjects   # Ordered list of unique 'subject' reference values names
        arm.objects    # Ordered list of unique 'object' reference values names
        arm.predicates # Ordered list of unique 'predicate', aka relationships, in this mapping

        # Check if a reference value is in this mapping as a subject or an object:
        'SBE' in arm  # Return True

        # Indexing is by subject values:
        arm['SBE']  # Return a dict with predicate as keys and objects as values

        # Iterate over all relationships:
        for relation in arm:
            print(relation['subject'], relation['predicate'])

Export method

       # Export all mapping relationships in a DataFrame:
       arm.to_dataframe()

       # To export mapping using AVTT jargon:
       arm.to_dataframe(raw=True)

PR work

Short history for argopy dev. team:

This PR thus allows for the full NVS support to be merged without sensor-related features,
hence, replacing #545 that is going to be closed without merging.

todo list:

  • Implement new features:
    • Static data for offline access (files and cli manager)
    • NVS stores (online and offline backends)
    • ArgoReferenceTable
    • ArgoReferenceValue
    • ArgoReferenceMapping
    • Extra interpretation of strings
      • R03 definition (Local_Attributes and Properties)
      • R14 definition (Template_Values)
      • R18 definition (Template_Values)
  • Add unit tests for new features:
    • NVS Store
    • ArgoReferenceTable
    • ArgoReferenceValue
    • ArgoReferenceMapping
  • Document new features:
    • Docstrings
    • Documentation page

@gmaze gmaze self-assigned this Jan 25, 2026
@gmaze gmaze added the enhancement New feature or request, development label Jan 25, 2026
@gmaze gmaze moved this from Queued to In Progress in Argopy Management Dashboard Jan 25, 2026
gmaze added 7 commits January 25, 2026 22:22
Add specific loading method for each format
- fix bug where by a urn could be parsed even if unvalid
remove if case never reached
- trying to solve windows failing CI tests
trying to solve windows ci/tests failure
gmaze added a commit that referenced this pull request Jan 26, 2026
- fix some deprecation warning
- also useful to debug Windows failing at #575
@gmaze gmaze marked this pull request as draft February 9, 2026 09:27
@gmaze gmaze marked this pull request as ready for review February 13, 2026 10:02
@gmaze gmaze requested a review from quai20 February 13, 2026 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request, development

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant