eidos function SIMD optimizations #578

andrewkern · 2025-11-28T03:09:19Z

SIMD Vectorization for Eidos Math Functions

Summary

Add eidos/eidos_simd.h with SSE4.2/AVX2 implementations for math operations
Add -DUSE_SIMD=ON/OFF CMake option (default: ON) with automatic CPU detection
Update math functions to use SIMD as the non-OpenMP fallback path

Vectorized functions: sqrt, abs, floor, ceil, trunc, round, sum, product

AVX2 processes 4 doubles/instruction; SSE4.2 processes 2 doubles/instruction.

Benchmark Suite

`run_benchmarks.sh [num_runs]`

Builds SLiM with and without SIMD, runs benchmarks, reports speedup.

./run_benchmarks.sh      # 3 runs (default)
./run_benchmarks.sh 10   # 10 runs

`simd_benchmark.eidos`

Tests math functions on 1M element arrays (100 iterations each).

`slim_benchmark.slim`

Full simulation benchmark: N=5000, 1Mb chromosome, 5000 generations with selection.

Benchmark Results

$ simd_benchmarks/run_benchmarks.sh 
============================================
SIMD Benchmark Runner
============================================
SLiM root: /home/adkern/SLiM
Runs per benchmark: 3

Building with SIMD enabled...
  Done.
Building with SIMD disabled...
  Done.

============================================
Eidos Math Function Benchmarks
============================================

SIMD Build:
  Running Eidos benchmark (SIMD)...
    sqrt():    0.105 sec
    abs():     0.171 sec
    floor():   0.164 sec
    ceil():    0.166 sec
    round():   0.164 sec
    trunc():   0.165 sec
    sum():     0.032 sec
    product(): 0.003 sec (1000 elements, 10000 iters)

Scalar Build:
  Running Eidos benchmark (Scalar)...
    sqrt():    0.108 sec
    abs():     0.166 sec
    floor():   0.231 sec
    ceil():    0.246 sec
    round():   0.473 sec
    trunc():   0.246 sec
    sum():     0.166 sec
    product(): 0.017 sec (1000 elements, 10000 iters)

============================================
SLiM Simulation Benchmark
(N=5000, 5000 generations, selection)
============================================

Running 3 iterations each...

SIMD Build:   12.756s (avg)
Scalar Build: 12.316s (avg)

Speedup: .96x

============================================
Benchmark complete
============================================

Adds compile-time SIMD detection (AVX2/SSE4.2/FMA) and vectorized implementations for sqrt, abs, floor, ceil, round, trunc, sum, and product. Benchmarks show 1.4-5.7x speedups on large float arrays. - CMakeLists.txt: add USE_SIMD option and compiler flag detection - eidos/eidos_simd.h: new header with SIMD intrinsic implementations - eidos/eidos_functions_math.cpp: use SIMD paths when available - eidos/eidos_test_*.{h,cpp}: use tolerance for float comparisons

Includes Eidos math function benchmark, SLiM simulation benchmark, and a runner script that builds both SIMD and scalar versions and compares performance.

andrewkern added 2 commits November 26, 2025 12:35

add SIMD benchmark scripts

4afd0ce

Includes Eidos math function benchmark, SLiM simulation benchmark, and a runner script that builds both SIMD and scalar versions and compares performance.

andrewkern marked this pull request as draft November 28, 2025 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eidos function SIMD optimizations #578

eidos function SIMD optimizations #578

Uh oh!

andrewkern commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eidos function SIMD optimizations #578

Are you sure you want to change the base?

eidos function SIMD optimizations #578

Uh oh!

Conversation

andrewkern commented Nov 28, 2025

SIMD Vectorization for Eidos Math Functions

Summary

Benchmark Suite

run_benchmarks.sh [num_runs]

simd_benchmark.eidos

slim_benchmark.slim

Benchmark Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`run_benchmarks.sh [num_runs]`

`simd_benchmark.eidos`

`slim_benchmark.slim`