Skip to content

Conversation

@AmanKashyap0807
Copy link

Purpose

closes #368

This PR adds support for writing an OutputVar back to a NetCDF file via write_to_netcdf.
The goal is to enable round trip workflow where users can read data, perform analysis, and then persist the resulting variable for later use.

To-do

  • Implement OutputVar to NetCDF file
  • Add tests for NetCDF write
  • Document the new functionality

Content

  • Added write_to_netcdf(path, var) to write an OutputVar to a NetCDF file, including its data, dimensions, and associated attributes.
  • Introduced _netcdf_safe_attrib to convert attribute values to NetCDF-
    compatible types when writing.
  • Added basic validation to ensure dimension consistency and short_name requirement for NetCDF output.
  • Added tests covering I/O, replacement of existing files, preservation of numeric types , handling of NaN .
  • updated the documentation.

Notes on scalar variables

Scalar (0D) variables: A round-trip test for scalar OutputVars (Test 7 in test/test_Var.jl) is currently commented out.
When reading a scalar variable from NetCDF, read_var returns an empty, untyped OrderedDict for dimensions. This does not match the typed dictionary expected by the OutputVar constructor and results in a MethodError during the read step.

Since this behavior originates in the existing read/constructor path and not in the new NetCDF writing logic, it was left unchanged here to keep this PR focused on issue #368. This can be addressed separately in a follow-up PR if desired.

If there are any adjustments you’d like to see I’m happy to make those changes.


  • I have read and checked the items on the review checklist.

Copy link
Member

@ph-kev ph-kev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this issue! I left some comments that should be addressed. Also, could you also add this to NEWS.md as well? Let me know if there was anything that isn't clear.

"""
write_to_netcdf(path::AbstractString, var::OutputVar)

Write the `OutputVar` to a NetCDF file at `path`, overwriting any existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should be overwriting a file if it exists. An error should be thrown instead.

Comment on lines +518 to +519
error(
"Dimension $dim_name must be 1D to be written to NetCDF (ndims=$(ndims(dim_val)))",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is okay to have. Can you mention it in the documentation?

Comment on lines +506 to +515
var_dim_names = collect(keys(var.dims))
ndims(var.data) == length(var.dims) || error(
"Number of dimensions in data ($(ndims(var.data))) does not match number of dims ($(length(var.dims)))",
)
for (i, (dim_name, dim_val)) in enumerate(var.dims)
if ndims(dim_val) == 1
expected_len = length(dim_val)
actual_len = size(var.data, i)
expected_len == actual_len || error(
"Dimension $dim_name has length $expected_len but data has size $actual_len along dimension $i",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to do this check. This check is already done in the OutputVar constructor. If this is possible, then this is a bug in ClimaAnalysis.

"Attributes:\n long_name => hi\nDimension attributes:\n lat:\n units => deg\nData defined over:\n lat with 0 element"
end

@testset "Write to NetCDF" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your tests, can you use TemplateVar instead? This should simplify the construction of the OutputVar.

Comment on lines +3641 to +3648
@test var_read.attributes["short_name"] == attribs["short_name"]
@test var_read.attributes["long_name"] == attribs["long_name"]
@test var_read.attributes["units"] == attribs["units"]
@test var_read.dims["lon"] == lon
@test var_read.dims["lat"] == lat
@test var_read.data ≈ data
@test var_read.dim_attributes["lon"]["units"] == "deg"
@test var_read.dim_attributes["lat"]["units"] == "deg"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably simplify this to something like

Suggested change
@test var_read.attributes["short_name"] == attribs["short_name"]
@test var_read.attributes["long_name"] == attribs["long_name"]
@test var_read.attributes["units"] == attribs["units"]
@test var_read.dims["lon"] == lon
@test var_read.dims["lat"] == lat
@test var_read.data data
@test var_read.dim_attributes["lon"]["units"] == "deg"
@test var_read.dim_attributes["lat"]["units"] == "deg"
@test var_read.attributes == var.attributes
@test var_read.dims == var.dims
@test var_read.data var.data
@test var_read.dim_attributes == var_read.dim_attributes


nc_var[:] = var.data

# Write attributes with type safety (overwrite existing keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment. It would be better to write that attributes of NetCDF files must be basic scalar types or strings (like what you did for the documentation of _netcdf_safe_attrib.

Comment on lines +3706 to +3707
# Test 6: NaN round-trip
nan_path = joinpath(tmpdir, "nan_file.nc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a test case for this.

Comment on lines +3724 to +3725
# # Test 7: Scalar (0D) round-trip
# scalar_path = joinpath(tmpdir, "scalar_file.nc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not possible in general with NetCDF files?

var = ClimaAnalysis.OutputVar(attribs, dims, dim_attribs, data)
ClimaAnalysis.write_to_netcdf(nc_path, var)

var_read = ClimaAnalysis.read_var(nc_path; short_name = "test_var")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to pass a short name here.

Suggested change
var_read = ClimaAnalysis.read_var(nc_path; short_name = "test_var")
var_read = ClimaAnalysis.read_var(nc_path)

Comment on lines +555 to +558
v isa AbstractString && return v
v isa Number && return v
v isa AbstractArray && eltype(v) <: Union{Number, AbstractString} &&
return v
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are different concrete types of AbstractString like String, LazyString, and SubString. Even though, we probably won't expect any of these, I think it would be better to do

v isa AbstractString && string(v)

I think you can apply a similar reasoning to the other paths of the function as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add functionality to write an OutputVar to a netcdf file

2 participants