Skip to content

Conversation

@andrewkern
Copy link

SIMD Vectorization for Eidos Math Functions

Summary

  • Add eidos/eidos_simd.h with SSE4.2/AVX2 implementations for math operations
  • Add -DUSE_SIMD=ON/OFF CMake option (default: ON) with automatic CPU detection
  • Update math functions to use SIMD as the non-OpenMP fallback path

Vectorized functions: sqrt, abs, floor, ceil, trunc, round, sum, product

AVX2 processes 4 doubles/instruction; SSE4.2 processes 2 doubles/instruction.

Benchmark Suite

run_benchmarks.sh [num_runs]

Builds SLiM with and without SIMD, runs benchmarks, reports speedup.

./run_benchmarks.sh      # 3 runs (default)
./run_benchmarks.sh 10   # 10 runs

simd_benchmark.eidos

Tests math functions on 1M element arrays (100 iterations each).

slim_benchmark.slim

Full simulation benchmark: N=5000, 1Mb chromosome, 5000 generations with selection.

Benchmark Results

$ simd_benchmarks/run_benchmarks.sh 
============================================
SIMD Benchmark Runner
============================================
SLiM root: /home/adkern/SLiM
Runs per benchmark: 3

Building with SIMD enabled...
  Done.
Building with SIMD disabled...
  Done.

============================================
Eidos Math Function Benchmarks
============================================

SIMD Build:
  Running Eidos benchmark (SIMD)...
    sqrt():    0.105 sec
    abs():     0.171 sec
    floor():   0.164 sec
    ceil():    0.166 sec
    round():   0.164 sec
    trunc():   0.165 sec
    sum():     0.032 sec
    product(): 0.003 sec (1000 elements, 10000 iters)

Scalar Build:
  Running Eidos benchmark (Scalar)...
    sqrt():    0.108 sec
    abs():     0.166 sec
    floor():   0.231 sec
    ceil():    0.246 sec
    round():   0.473 sec
    trunc():   0.246 sec
    sum():     0.166 sec
    product(): 0.017 sec (1000 elements, 10000 iters)

============================================
SLiM Simulation Benchmark
(N=5000, 5000 generations, selection)
============================================

Running 3 iterations each...

SIMD Build:   12.756s (avg)
Scalar Build: 12.316s (avg)

Speedup: .96x

============================================
Benchmark complete
============================================

Adds compile-time SIMD detection (AVX2/SSE4.2/FMA) and vectorized
implementations for sqrt, abs, floor, ceil, round, trunc, sum, and
product. Benchmarks show 1.4-5.7x speedups on large float arrays.

- CMakeLists.txt: add USE_SIMD option and compiler flag detection
- eidos/eidos_simd.h: new header with SIMD intrinsic implementations
- eidos/eidos_functions_math.cpp: use SIMD paths when available
- eidos/eidos_test_*.{h,cpp}: use tolerance for float comparisons
Includes Eidos math function benchmark, SLiM simulation benchmark,
and a runner script that builds both SIMD and scalar versions and
compares performance.
@andrewkern andrewkern marked this pull request as draft November 28, 2025 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant