Skip to content

Allow merge in concat dataset #765

@elynnwu

Description

@elynnwu

When training multiple label datasets, we have a use case for having merge inside a concat. Ideally we will just have every variables we need in each labeled dataset, but it is often that one dataset is missing some variables. For example, if c96-shield does not have CO2, it will be nice if we can just merge a separate dataset. The resulting config will look something like this:

  dataset:
    concat:
    - data_path: /climate-default/2024-06-20-era5-1deg-8layer-1940-2022-netcdfs
      labels:
      - era5
      subset:
        start_time: '1979-01-01T00:00:00'
        stop_time: '1986-03-31T18:00:00'
    - merge:
       - data_path: /climate-default/2024-07-24-vertically-resolved-c96-1deg-shield-amip-ensemble-dataset/netCDFs/ic_0001
          labels:
          - c96-shield
          subset:
            start_time: '1979-01-01'
            stop_time: '2020-12-31'
       - data_path: /climate-default/shield-co2-data
          labels:
          - c96-shield

We support concat inside merge but not the other way around because concat is supposed to concatenate across time. But in the context of multi-label datasets, each concat is actually a different data source, it'll be convenient if we can have merge within each dataset.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions