-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Summary
PR #184 introduced a performance regression in output_orbits_macrostep. The original orbits.nc design (PR #182) was parallel, but it got serialized when the Python API was refactored in PR #184.
Background
Legacy fort.9XXXX: One file per particle. Fully parallel since each thread writes to its own file. Had file handle leaks.
PR #182 (1c38cfa): Introduced orbits.nc with a shared buffer. Each thread writes to its own slice in memory, then a single bulk write at the end. No lock contention.
src/simple_main.f90
call trace_orbit(norb, i)
if (output_orbits_macrostep) call flush_orbit(ipart)
end subroutine trace_orbitPR #184 (d607ec1): Refactored trace_orbit() to return trajectory arrays for the Python API. Added !$omp critical as a quick fix to write to the NetCDF file, which serializes all I/O.
call trace_orbit(norb, i, traj, times)
if (output_orbits_macrostep) then
!$omp critical
call write_orbit_to_netcdf(i, traj, times)
!$omp end critical
end ifSuggested Fix
Keep the new trace_orbit() signature but restore the buffer approach: copy the returned arrays into a shared buffer (no lock needed since each particle index is unique), then bulk write at the end.