eidos function SIMD optimizations #578
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SIMD Vectorization for Eidos Math Functions
Summary
eidos/eidos_simd.hwith SSE4.2/AVX2 implementations for math operations-DUSE_SIMD=ON/OFFCMake option (default: ON) with automatic CPU detectionVectorized functions:
sqrt,abs,floor,ceil,trunc,round,sum,productAVX2 processes 4 doubles/instruction; SSE4.2 processes 2 doubles/instruction.
Benchmark Suite
run_benchmarks.sh [num_runs]Builds SLiM with and without SIMD, runs benchmarks, reports speedup.
simd_benchmark.eidosTests math functions on 1M element arrays (100 iterations each).
slim_benchmark.slimFull simulation benchmark: N=5000, 1Mb chromosome, 5000 generations with selection.
Benchmark Results
$ simd_benchmarks/run_benchmarks.sh ============================================ SIMD Benchmark Runner ============================================ SLiM root: /home/adkern/SLiM Runs per benchmark: 3 Building with SIMD enabled... Done. Building with SIMD disabled... Done. ============================================ Eidos Math Function Benchmarks ============================================ SIMD Build: Running Eidos benchmark (SIMD)... sqrt(): 0.105 sec abs(): 0.171 sec floor(): 0.164 sec ceil(): 0.166 sec round(): 0.164 sec trunc(): 0.165 sec sum(): 0.032 sec product(): 0.003 sec (1000 elements, 10000 iters) Scalar Build: Running Eidos benchmark (Scalar)... sqrt(): 0.108 sec abs(): 0.166 sec floor(): 0.231 sec ceil(): 0.246 sec round(): 0.473 sec trunc(): 0.246 sec sum(): 0.166 sec product(): 0.017 sec (1000 elements, 10000 iters) ============================================ SLiM Simulation Benchmark (N=5000, 5000 generations, selection) ============================================ Running 3 iterations each... SIMD Build: 12.756s (avg) Scalar Build: 12.316s (avg) Speedup: .96x ============================================ Benchmark complete ============================================