(0.97.4) Speed up reductions and remove inference problems in Field constructor#4668
(0.97.4) Speed up reductions and remove inference problems in Field constructor#4668
Conversation
|
Nice work. Perhaps, there is a type inference issue (the usual culprit)... |
|
Indeed, there is quite a difference between allocating vs non-allocating reductions: julia> ur = Field{Nothing, Nothing, Nothing}(grid);
julia> @benchmark sum(interior(u))
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
Range (min … max): 1.779 μs … 4.162 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.817 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.824 μs ± 64.381 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ ▇ █ ▇ ▃
▂▁▂▁▁▂▁▂▁▁▃▁▄▁▁▇▁█▁▁█▁█▁▁█▁█▁▁█▁▆▁▁▅▁▄▁▁▃▁▃▁▁▃▁▂▁▁▂▁▂▁▁▂▁▂ ▃
1.78 μs Histogram: frequency by time 1.88 μs <
Memory estimate: 192 bytes, allocs estimate: 5.
julia> @benchmark sum(u)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 46.875 μs … 2.416 ms ┊ GC (min … max): 0.00% … 96.24%
Time (median): 47.958 μs ┊ GC (median): 0.00%
Time (mean ± σ): 48.826 μs ± 40.719 μs ┊ GC (mean ± σ): 1.42% ± 1.67%
▁▄▇ ██▆▆▄ ▂
▁▁▂▂▃▂▅▇█████████▆██▆▅▅▂▃▃▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
46.9 μs Histogram: frequency by time 51.4 μs <
Memory estimate: 27.88 KiB, allocs estimate: 253.
julia> @benchmark sum!(ur, u)
BenchmarkTools.Trial: 10000 samples with 10 evaluations per sample.
Range (min … max): 1.492 μs … 263.913 μs ┊ GC (min … max): 0.00% … 98.28%
Time (median): 1.546 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.620 μs ± 2.629 μs ┊ GC (mean ± σ): 1.60% ± 0.98%
▃▅█▂▁ ▁
▃█████▄▅▆█▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂ ▃
1.49 μs Histogram: frequency by time 2.11 μs <
Memory estimate: 1.30 KiB, allocs estimate: 5.
So I guess the culprit is not the |
|
Apparently, constructors have problems in inference when passing types instead of the instantiated. Working with instantiated locations rather than types speeds up considerably the computations. This is the most recent result: julia> @benchmark mean(u)
BenchmarkTools.Trial: 10000 samples with 1 evaluation per sample.
Range (min … max): 11.542 μs … 698.083 μs ┊ GC (min … max): 0.00% … 95.18%
Time (median): 12.166 μs ┊ GC (median): 0.00%
Time (mean ± σ): 12.474 μs ± 9.701 μs ┊ GC (mean ± σ): 1.05% ± 1.33%
▁▄▃▄▅▆▆▇█▆▆▆▆▄▃▃▂▁▁▂ ▂▃▂▂▂▃▂▁▃▁▁▁▁ ▁ ▂
▅▇███████████████████████████████████▇█▆▇▆▇▆▆▇▇▇▇█▇▅▇▅▆▅▅▅▄▅ █
11.5 μs Histogram: log(frequency) by time 14.8 μs <
Memory estimate: 10.69 KiB, allocs estimate: 65.so now |
|
This might become a lengthy PR because it changes the constructors so it might take a while to make sure everything is consistent, however, I feel like this PR might also improve the time for building models, because it removes quite some inference problems in the field constructor and boundary conditions constructors |
|
I think all tests are fixed now, this should be ready to review. This PR changes the field constructor from Note that the |
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
|
Some of the test in the Not sure if this is related to #4670? Look at this: using Oceananigans
underlying_grid = RectilinearGrid(topology=(Bounded, Periodic, Bounded), size = (2, 2, 2), x = (0, 0.3), y = (0, 0.4), z=(-0.5, 0.5))
grid = ImmersedBoundaryGrid(underlying_grid, GridFittedBottom((x, y) -> 0))
boundary_conditions = (; :c => FieldBoundaryConditions(immersed = BoundaryCondition(Flux(), π)))
model = NonhydrostaticModel(; grid, boundary_conditions, tracers=:c)
model.tracers.c.boundary_conditionson
|

Apparently,
mapreduces performance; this is the cost of some simple reductions on main and on this branch:on main
On this PR
There are still differences between the reduction of
interiorand reduction of fields that can be ascribed completely to theFieldconstructor. I ll try to solve also that issue.