Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Toon-McKay 1989 radiative transfer solver implementation, converting it from a Torch-based implementation to a lower-level C/C++ implementation with custom memory management and device dispatch. It adds comprehensive testing infrastructure and GPU support.
Changes:
- Refactored Toon-McKay 89 shortwave and longwave solvers from Torch tensor operations to C/C++ implementations with manual memory management
- Added new test infrastructure with device testing support (CPU/CUDA/MPS)
- Implemented custom dispatch mechanism for CPU and CUDA backends with tridiagonal solver
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_toon.cpp | New test file for Toon-McKay 89 solver with device parameterization |
| tests/device_testing.hpp | New device testing infrastructure supporting CPU, CUDA, and MPS |
| tests/CMakeLists.txt | Enables test_toon test (replaces test_composition) |
| src/rtsolver/toon_mckay89_shortwave_impl.h | Refactored shortwave solver implementation with C-style memory management |
| src/rtsolver/toon_mckay89_longwave_impl.h | Refactored longwave solver implementation with C-style memory management |
| src/rtsolver/toon_mckay89.hpp | Updated header with new options structure and module holder pattern |
| src/rtsolver/toon_mckay89.cpp | Main implementation using TensorIterator dispatch |
| src/rtsolver/dtridgl_impl.h | New tridiagonal solver using Thomas algorithm |
| src/rtsolver/rtsolver_dispatch.hpp | Dispatch declarations for CPU/CUDA backends |
| src/rtsolver/rtsolver_dispatch.cu | CUDA kernel implementations |
| src/rtsolver/rtsolver_dispatch.cpp | CPU implementations with parallel iteration |
| src/utils/alloc.h | Memory allocation utilities with alignment and space calculation |
| src/loops.cuh | GPU kernel infrastructure for chunked processing |
| src/radiation/bbflux.hpp | Added tensor-based blackbody flux calculation |
| src/radiation/bbflux.cpp | Implementation of tensor blackbody flux with power series |
| src/radiation/radiation_band.hpp | Added Toon solver option to radiation band |
| src/radiation/radiation_band.cpp | Integration of Toon solver in radiation band |
| python/csrc/pyrtsolver.cpp | New Python bindings for Toon solver |
| python/csrc/pyradiation.cpp | Updated radiation bindings with Toon options |
| python/csrc/pyharp.cpp | Added rtsolver binding initialization |
| src/CMakeLists.txt | Enabled rtsolver source files in build |
| cmake/macros/macro_setup_test.cmake | Added CUDA library linking for tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Af[l] = E1[nlay - 1] - w_surf_in * E3[nlay - 1]; | ||
| Bf[l] = E2[nlay - 1] - w_surf_in * E4[nlay - 1]; | ||
| Cf[l] = 0.0; | ||
| Df[l] = bsurf - Cp[nlay - 1] + w_surf_in * Cm[nlay - 1]; | ||
|
|
||
| dtridgl(l, Af, Bf, Cf, Df, xk); |
There was a problem hiding this comment.
Array index out of bounds. The arrays Af, Bf, Cf, Df, and xk are allocated with size 'l' (which equals 2nlay) on lines 54-58, making valid indices 0 through l-1. However, the code accesses these arrays at index 'l' on lines 174-177 and passes them to dtridgl with size 'l' on line 179. This means the code is accessing index 2nlay which is one past the end of arrays that have size 2*nlay. The arrays should either be allocated with size l+1, or the indexing scheme should be adjusted to be 0-based.
| Af[l] = E1[nlay - 1] - w_surf_in * E3[nlay - 1]; | |
| Bf[l] = E2[nlay - 1] - w_surf_in * E4[nlay - 1]; | |
| Cf[l] = 0.0; | |
| Df[l] = bsurf - Cp[nlay - 1] + w_surf_in * Cm[nlay - 1]; | |
| dtridgl(l, Af, Bf, Cf, Df, xk); | |
| Af[lm1] = E1[nlay - 1] - w_surf_in * E3[nlay - 1]; | |
| Bf[lm1] = E2[nlay - 1] - w_surf_in * E4[nlay - 1]; | |
| Cf[lm1] = 0.0; | |
| Df[lm1] = bsurf - Cp[nlay - 1] + w_surf_in * Cm[nlay - 1]; | |
| dtridgl(lm1, Af, Bf, Cf, Df, xk); |
| * Based on Elsie Lee's implementation in Exo-FMS_column_ck, which was | ||
| * based on CHIMERA code by Mike Line. | ||
| * Ported by Xi Zhang to Eigen | ||
| * Ported by Cheng Li to torch |
There was a problem hiding this comment.
Typo in comment. "Proted" should be "Ported".
| * Based on Elsie Lee's implementation in Exo-FMS_column_ck, which was | ||
| * based on CHIMERA code by Mike Line. | ||
| * Ported by Xi Zhang to Eigen | ||
| * Ported by Cheng Li to torch |
There was a problem hiding this comment.
Typo in comment. "Proted" should be "Ported".
| xk1[n] = xk[2 * n + 1] + xk[2 * n + 2]; | ||
| xk2[n] = xk[2 * n + 1] - xk[2 * n + 2]; | ||
| if (fabs(xk2[n]) < 1e-30 * fabs(xk[2 * n + 1])) xk2[n] = 0.0; |
There was a problem hiding this comment.
Array index out of bounds. After dtridgl returns, the code accesses xk[2 * n + 2] where n ranges from 0 to nlay-1. When n = nlay-1, this accesses xk[2nlay] which is out of bounds for an array of size l = 2nlay (valid indices are 0 to 2*nlay-1). This issue exists because the solver is using 1-based indexing but the xk array is allocated with 0-based sizing.
| xk1[n] = xk[2 * n + 1] + xk[2 * n + 2]; | |
| xk2[n] = xk[2 * n + 1] - xk[2 * n + 2]; | |
| if (fabs(xk2[n]) < 1e-30 * fabs(xk[2 * n + 1])) xk2[n] = 0.0; | |
| xk1[n] = xk[2 * n] + xk[2 * n + 1]; | |
| xk2[n] = xk[2 * n] - xk[2 * n + 1]; | |
| if (fabs(xk2[n]) < 1e-30 * fabs(xk[2 * n])) xk2[n] = 0.0; |
| xk1[n] = xkk[2 * n + 1] + xkk[2 * n + 2]; | ||
| xk2[n] = xkk[2 * n + 1] - xkk[2 * n + 2]; | ||
| if (fabs(xk2[n]) < 1e-30 * fabs(xkk[2 * n + 1])) xk2[n] = 0.0; |
There was a problem hiding this comment.
Array index out of bounds. After dtridgl returns, the code accesses xkk[2 * n + 2] where n ranges from 0 to nlay-1. When n = nlay-1, this accesses xkk[2nlay] which is out of bounds for an array of size l = 2nlay (valid indices are 0 to 2*nlay-1). This issue exists because the solver is using 1-based indexing but the xkk array is allocated with 0-based sizing.
| xk1[n] = xkk[2 * n + 1] + xkk[2 * n + 2]; | |
| xk2[n] = xkk[2 * n + 1] - xkk[2 * n + 2]; | |
| if (fabs(xk2[n]) < 1e-30 * fabs(xkk[2 * n + 1])) xk2[n] = 0.0; | |
| // Use 0-based indexing for xkk: valid indices are 0..(2 * nlay - 1) | |
| xk1[n] = xkk[2 * n] + xkk[2 * n + 1]; | |
| xk2[n] = xkk[2 * n] - xkk[2 * n + 1]; | |
| T xkk_ref = fmax(fabs(xkk[2 * n]), fabs(xkk[2 * n + 1])); | |
| if (xkk_ref > 0.0 && fabs(xk2[n]) < 1e-30 * xkk_ref) xk2[n] = 0.0; |
| void call_toon89_sw_cpu(at::TensorIterator &iter) { | ||
| AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "call_toon89_sw_cpu", [&] { | ||
| int nlay = at::native::ensure_nonempty_size(iter.input(0), -2); | ||
| int grain_size = iter.numel() / at::get_num_threads(); | ||
| int mem_size = toon89_sw_space<scalar_t>(nlay); | ||
| char *work = new char[mem_size]; | ||
|
|
||
| iter.for_each( | ||
| [&](char **data, const int64_t *strides, int64_t n) { | ||
| for (int i = 0; i < n; i++) { | ||
| auto out = reinterpret_cast<scalar_t *>(data[0] + i * strides[0]); | ||
| auto prop = reinterpret_cast<scalar_t *>(data[1] + i * strides[1]); | ||
| auto umu0 = reinterpret_cast<scalar_t *>(data[2] + i * strides[2]); | ||
| auto fbeam = reinterpret_cast<scalar_t *>(data[3] + i * strides[3]); | ||
| auto albedo = | ||
| reinterpret_cast<scalar_t *>(data[4] + i * strides[4]); | ||
| toon_mckay89_shortwave(nlay, *fbeam, umu0, prop, *albedo, out, | ||
| work); | ||
| } | ||
| }, | ||
| grain_size); | ||
|
|
||
| delete[] work; |
There was a problem hiding this comment.
Thread safety issue with shared work buffer. The code allocates a single work buffer and then uses it across multiple threads in the parallel for_each loop. Each thread will overwrite the same work buffer, causing race conditions and incorrect results. Each thread needs its own work buffer. Consider using thread-local storage or allocating work buffers per-thread within the parallel region.
| if (bc->find(bname + "fbeam") != bc->end()) { | ||
| TORCH_CHECK(bc->at(bname + "fbeam").dim() == 2, | ||
| "DisortImpl::forward: bc->fbeam.dim() != 2"); | ||
| TORCH_CHECK(bc->at(bname + "fbeam").size(0) == nwave, | ||
| "DisortImpl::forward: bc->fbeam.size(0) != nwave"); | ||
| TORCH_CHECK(bc->at(bname + "fbeam").size(1) == ncol, | ||
| "DisortImpl::forward: bc->fbeam.size(1) != ncol"); | ||
| (*bc)["fbeam"] = bc->at(bname + "fbeam"); | ||
| } else { | ||
| (*bc)["fbeam"] = torch::zeros({nwave, ncol}, prop.options()); | ||
| } | ||
|
|
||
| if (bc->find(bname + "albedo") != bc->end()) { | ||
| TORCH_CHECK(bc->at(bname + "albedo").dim() == 2, | ||
| "DisortImpl::forward: bc->albedo.dim() != 2"); | ||
| TORCH_CHECK(bc->at(bname + "albedo").size(0) == nwave, | ||
| "DisortImpl::forward: bc->albedo.size(0) != nwave"); | ||
| TORCH_CHECK(bc->at(bname + "albedo").size(1) == ncol, | ||
| "DisortImpl::forward: bc->albedo.size(1) != ncol"); | ||
| (*bc)["albedo"] = bc->at(bname + "albedo"); | ||
| } else { | ||
| (*bc)["albedo"] = torch::zeros({nwave, ncol}, prop.options()); | ||
| } |
There was a problem hiding this comment.
Incorrect error message. The check is for ToonMcKay89 forward method, but the error message says "DisortImpl::forward". All three error messages in this block should be updated to "ToonMcKay89Impl::forward" for consistency and clarity.
| "DisortImpl::forward: bc->umu0.size(0) != ncol"); | ||
| (*bc)["umu0"] = bc->at(bname + "umu0"); | ||
| } else { | ||
| (*bc)["umu0"] = torch::ones({1, ncol}, prop.options()); |
There was a problem hiding this comment.
Incorrect default value for umu0. The code sets a default value of torch::ones({1, ncol}, ...) which creates a tensor with shape (1, ncol), but the dimension check on line 46-47 expects umu0 to have dim() == 1. This mismatch will cause the check to fail when using the default value. The default should be torch::ones({ncol}, prop.options()) instead.
| (*bc)["umu0"] = torch::ones({1, ncol}, prop.options()); | |
| (*bc)["umu0"] = torch::ones({ncol}, prop.options()); |
| auto offset_calc = ::make_offset_calculator<Arity>(iter); | ||
| int64_t numel = iter.numel(); | ||
|
|
||
| // devide numel into Chunk parts to reduce memory usage |
There was a problem hiding this comment.
Typo in comment. "devide" should be "divide".
| // devide numel into Chunk parts to reduce memory usage | |
| // divide numel into Chunk parts to reduce memory usage |
| // First row | ||
| c[0] = c[0] / b[0]; | ||
| d[0] = d[0] / b[0]; | ||
|
|
||
| // Forward sweep | ||
| for (int i = 1; i < n; ++i) { | ||
| T denom = b[i] - a[i] * c[i - 1]; | ||
| if (denom == 0.0) denom = 1e-12; // Avoid division by zero | ||
| c[i] = (i < n - 1) ? c[i] / denom : 0.0; | ||
| d[i] = (d[i] - a[i] * d[i - 1]) / denom; | ||
| } | ||
|
|
||
| // Back substitution | ||
| x[n - 1] = d[n - 1]; | ||
| for (int i = n - 2; i >= 0; --i) { |
There was a problem hiding this comment.
Potential indexing mismatch with caller. The dtridgl function is called with parameter 'n' representing the size of the system, but the implementation uses 0-based indexing throughout (accessing c[0], d[0], etc.). Meanwhile, the caller code in toon_mckay89_shortwave_impl.h and toon_mckay89_longwave_impl.h uses 1-based indexing when setting up the matrix (accessing Af[1], Bf[1], ..., Af[l], Bf[l] where l = 2*nlay). This suggests either the arrays passed should have an offset applied, or dtridgl should account for 1-based indexing. The current implementation will process elements at indices 0 through n-1, but the caller expects processing of indices 1 through n.
| // First row | |
| c[0] = c[0] / b[0]; | |
| d[0] = d[0] / b[0]; | |
| // Forward sweep | |
| for (int i = 1; i < n; ++i) { | |
| T denom = b[i] - a[i] * c[i - 1]; | |
| if (denom == 0.0) denom = 1e-12; // Avoid division by zero | |
| c[i] = (i < n - 1) ? c[i] / denom : 0.0; | |
| d[i] = (d[i] - a[i] * d[i - 1]) / denom; | |
| } | |
| // Back substitution | |
| x[n - 1] = d[n - 1]; | |
| for (int i = n - 2; i >= 0; --i) { | |
| // First row (1-based indexing: use elements 1..n) | |
| c[1] = c[1] / b[1]; | |
| d[1] = d[1] / b[1]; | |
| // Forward sweep | |
| for (int i = 2; i <= n; ++i) { | |
| T denom = b[i] - a[i] * c[i - 1]; | |
| if (denom == 0.0) denom = 1e-12; // Avoid division by zero | |
| c[i] = (i < n) ? c[i] / denom : static_cast<T>(0); | |
| d[i] = (d[i] - a[i] * d[i - 1]) / denom; | |
| } | |
| // Back substitution | |
| x[n] = d[n]; | |
| for (int i = n - 1; i >= 1; --i) { |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "ToonMcKay89::forward: bc->umu0.size(0) != ncol"); | ||
| (*bc)["umu0"] = bc->at(bname + "umu0"); | ||
| } else { | ||
| (*bc)["umu0"] = torch::ones({1, ncol}, prop.options()); |
There was a problem hiding this comment.
The default value for umu0 when not provided is created with shape {1, ncol}, but according to the check on line 48, umu0 should have dim() == 1 with size(0) == ncol. The default value has dim() == 2, which is inconsistent with the validation logic.
| (*bc)["umu0"] = torch::ones({1, ncol}, prop.options()); | |
| (*bc)["umu0"] = torch::ones({ncol}, prop.options()); |
| // Forward sweep | ||
| for (int i = 1; i < n; ++i) { | ||
| T denom = b[i] - a[i] * c[i - 1]; | ||
| if (denom == 0.0) denom = 1e-12; // Avoid division by zero |
There was a problem hiding this comment.
The comparison denom == 0.0 for floating-point numbers is problematic. Consider using a threshold comparison like fabs(denom) < epsilon instead to handle floating-point precision issues.
| Df[i - 1] = (gam[n_idx + 1] - 1.0) * (Cpm1[n_idx + 1] - Cp[n_idx]) + | ||
| (1.0 - gam[n_idx + 1]) * (Cm[n_idx] - Cmm1[n_idx + 1]); | ||
| n_idx++; | ||
| } |
There was a problem hiding this comment.
The variable n_idx is initialized but then immediately reset to 0 on line 165. The initialization on line 156 is redundant and could be confusing. Consider removing the first initialization or adding a comment explaining why the counter is reset.
| } | |
| } | |
| // Reuse n_idx: restart index from 0 for the second matrix row pattern |
| char *work = new char[mem_size]; | ||
|
|
||
| iter.for_each( | ||
| [&](char **data, const int64_t *strides, int64_t n) { | ||
| for (int i = 0; i < n; i++) { | ||
| auto out = reinterpret_cast<scalar_t *>(data[0] + i * strides[0]); | ||
| auto prop = reinterpret_cast<scalar_t *>(data[1] + i * strides[1]); | ||
| auto umu0 = reinterpret_cast<scalar_t *>(data[2] + i * strides[2]); | ||
| auto fbeam = reinterpret_cast<scalar_t *>(data[3] + i * strides[3]); | ||
| auto albedo = | ||
| reinterpret_cast<scalar_t *>(data[4] + i * strides[4]); | ||
| toon_mckay89_shortwave(nlay, *fbeam, umu0, prop, *albedo, out, | ||
| work); | ||
| } | ||
| }, | ||
| grain_size); | ||
|
|
||
| delete[] work; |
There was a problem hiding this comment.
Memory allocated with new char[mem_size] on line 20 is deleted on line 37, but if an exception is thrown between these lines (e.g., in the lambda or for_each call), the memory will leak. Consider using RAII (e.g., std::unique_ptr) or wrapping the code in a try-catch block to ensure cleanup.
| (*bc)["umu0"] = bc->at(bname + "umu0"); | ||
| } else { | ||
| (*bc)["umu0"] = torch::ones({1, ncol}, prop.options()); |
There was a problem hiding this comment.
The boundary condition checks on lines 50-52 incorrectly assign bc->at("umu0") to (*bc)["umu0"], but "umu0" doesn't have a bname prefix. This could cause issues if bname is non-empty. The assignment should either use the prefixed key (bname + "umu0") or the logic should be clarified.
| for (int i = 0; i < nlay; i++) { | ||
| T g1 = sqrt3d2 * (2.0 - w0[i] * (1.0 + hg[i])); | ||
| T g2 = (sqrt3d2 * w0[i]) * (1.0 - hg[i]); | ||
| if (g2 == 0.0) g2 = 1.0e-10; |
There was a problem hiding this comment.
The comparison g2 == 0.0 for floating-point numbers is problematic as it may not catch values that are very close to zero due to floating-point precision issues. Consider using a threshold comparison like fabs(g2) < epsilon instead.
| if (g2 == 0.0) g2 = 1.0e-10; | |
| if (fabs(g2) < 1.0e-10) g2 = 1.0e-10; |
| T lam = sqrt(g1 * g1 - g2 * g2); | ||
| gam[i] = (g1 - lam) / g2; | ||
| T denom = (lam * lam) - 1.0 / (mu_zm[i] * mu_zm[i]); | ||
| if (denom == 0.0) denom = 1.0e-10; |
There was a problem hiding this comment.
The comparison denom == 0.0 for floating-point numbers is problematic as it may not catch values that are very close to zero due to floating-point precision issues. Consider using a threshold comparison like fabs(denom) < epsilon instead.
| if (denom == 0.0) denom = 1.0e-10; | |
| if (fabs(denom) < 1.0e-10) denom = (denom >= 0.0 ? 1.0e-10 : -1.0e-10); |
| int n_idx = 0; | ||
| for (int i = 2; i <= lm2; i += 2) { | ||
| Af[i - 1] = (E1[n_idx] + E3[n_idx]) * (gam[n_idx + 1] - 1.0); | ||
| Bf[i - 1] = (E2[n_idx] + E4[n_idx]) * (gam[n_idx + 1] - 1.0); | ||
| Cf[i - 1] = 2.0 * (1.0 - gam[n_idx + 1] * gam[n_idx + 1]); | ||
| Df[i - 1] = (gam[n_idx + 1] - 1.0) * (Cpm1[n_idx + 1] - Cp[n_idx]) + | ||
| (1.0 - gam[n_idx + 1]) * (Cm[n_idx] - Cmm1[n_idx + 1]); | ||
| n_idx++; | ||
| } | ||
|
|
||
| n_idx = 0; | ||
| for (int i = 3; i <= lm1; i += 2) { | ||
| Af[i - 1] = 2.0 * (1.0 - gam[n_idx] * gam[n_idx]); | ||
| Bf[i - 1] = (E1[n_idx] - E3[n_idx]) * (1.0 + gam[n_idx + 1]); | ||
| Cf[i - 1] = (E1[n_idx] + E3[n_idx]) * (gam[n_idx + 1] - 1.0); | ||
| Df[i - 1] = E3[n_idx] * (Cpm1[n_idx + 1] - Cp[n_idx]) + | ||
| E1[n_idx] * (Cm[n_idx] - Cmm1[n_idx + 1]); | ||
| n_idx++; |
There was a problem hiding this comment.
The variable n_idx is initialized but then immediately reset to 0 on line 151. The initialization on line 141 is redundant and could be confusing. Consider removing the first initialization or adding a comment explaining why the counter is reset.
| int n_idx = 0; | |
| for (int i = 2; i <= lm2; i += 2) { | |
| Af[i - 1] = (E1[n_idx] + E3[n_idx]) * (gam[n_idx + 1] - 1.0); | |
| Bf[i - 1] = (E2[n_idx] + E4[n_idx]) * (gam[n_idx + 1] - 1.0); | |
| Cf[i - 1] = 2.0 * (1.0 - gam[n_idx + 1] * gam[n_idx + 1]); | |
| Df[i - 1] = (gam[n_idx + 1] - 1.0) * (Cpm1[n_idx + 1] - Cp[n_idx]) + | |
| (1.0 - gam[n_idx + 1]) * (Cm[n_idx] - Cmm1[n_idx + 1]); | |
| n_idx++; | |
| } | |
| n_idx = 0; | |
| for (int i = 3; i <= lm1; i += 2) { | |
| Af[i - 1] = 2.0 * (1.0 - gam[n_idx] * gam[n_idx]); | |
| Bf[i - 1] = (E1[n_idx] - E3[n_idx]) * (1.0 + gam[n_idx + 1]); | |
| Cf[i - 1] = (E1[n_idx] + E3[n_idx]) * (gam[n_idx + 1] - 1.0); | |
| Df[i - 1] = E3[n_idx] * (Cpm1[n_idx + 1] - Cp[n_idx]) + | |
| E1[n_idx] * (Cm[n_idx] - Cmm1[n_idx + 1]); | |
| n_idx++; | |
| int n_idx_even = 0; | |
| for (int i = 2; i <= lm2; i += 2) { | |
| Af[i - 1] = (E1[n_idx_even] + E3[n_idx_even]) * (gam[n_idx_even + 1] - 1.0); | |
| Bf[i - 1] = (E2[n_idx_even] + E4[n_idx_even]) * (gam[n_idx_even + 1] - 1.0); | |
| Cf[i - 1] = 2.0 * (1.0 - gam[n_idx_even + 1] * gam[n_idx_even + 1]); | |
| Df[i - 1] = (gam[n_idx_even + 1] - 1.0) * (Cpm1[n_idx_even + 1] - Cp[n_idx_even]) + | |
| (1.0 - gam[n_idx_even + 1]) * (Cm[n_idx_even] - Cmm1[n_idx_even + 1]); | |
| n_idx_even++; | |
| } | |
| int n_idx_odd = 0; | |
| for (int i = 3; i <= lm1; i += 2) { | |
| Af[i - 1] = 2.0 * (1.0 - gam[n_idx_odd] * gam[n_idx_odd]); | |
| Bf[i - 1] = (E1[n_idx_odd] - E3[n_idx_odd]) * (1.0 + gam[n_idx_odd + 1]); | |
| Cf[i - 1] = (E1[n_idx_odd] + E3[n_idx_odd]) * (gam[n_idx_odd + 1] - 1.0); | |
| Df[i - 1] = E3[n_idx_odd] * (Cpm1[n_idx_odd + 1] - Cp[n_idx_odd]) + | |
| E1[n_idx_odd] * (Cm[n_idx_odd] - Cmm1[n_idx_odd + 1]); | |
| n_idx_odd++; |
| * Ported by Xi Zhang to Eigen | ||
| * Ported by Cheng Li to torch | ||
| * Reference: Toon, O.B., 1989, JGR, 94, 16287-16301. |
There was a problem hiding this comment.
The comment has a typo: "Proted" should be "Ported".
| * Ported by Xi Zhang to Eigen | |
| * Ported by Cheng Li to torch | |
| * Reference: Toon, O.B., 1989, JGR, 94, 16287-16301. | |
| * Ported by Cheng Li to torch | |
| * Reference: Toon, O.B., 1989, JGR, 94, 16287-16301. | |
| * Reference: Toon, O.B., 1989, JGR, 94, 16287-16301. |
| char *work = new char[mem_size]; | ||
|
|
||
| iter.for_each( | ||
| [&](char **data, const int64_t *strides, int64_t n) { | ||
| for (int i = 0; i < n; i++) { | ||
| auto out = reinterpret_cast<scalar_t *>(data[0] + i * strides[0]); | ||
| auto prop = reinterpret_cast<scalar_t *>(data[1] + i * strides[1]); | ||
| auto be = reinterpret_cast<scalar_t *>(data[2] + i * strides[2]); | ||
| auto albedo = | ||
| reinterpret_cast<scalar_t *>(data[3] + i * strides[3]); | ||
| toon_mckay89_longwave(nlay, be, prop, *albedo, out, work); | ||
| } | ||
| }, | ||
| grain_size); | ||
|
|
||
| delete[] work; |
There was a problem hiding this comment.
Memory allocated with new char[mem_size] on line 46 is deleted on line 61, but if an exception is thrown between these lines (e.g., in the lambda or for_each call), the memory will leak. Consider using RAII (e.g., std::unique_ptr) or wrapping the code in a try-catch block to ensure cleanup.
|
🎉 Released v1.8.7! What's Changed
Full Changelog: v1.8.6...v1.8.7 |
No description provided.