Skip to content

pbathuri/finance-cache-hpc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

finance-cache-hpc

Empirical L1 cache behaviour of four quantitative finance kernels on AMD EPYC

Cholesky · Monte Carlo paths · GARCH(1,1) MLE · dense GEMM - instrumented with PAPI hardware counters on Indiana University's Big Red 200.



TL;DR

Most papers that talk about "cache-friendly finance code" argue from the algorithm. This repo argues from the counter. Four production-representative kernels, a parametric build sweep across layouts and algorithms, PAPI counters pinned to a fixed core, and three findings that run counter to textbook intuition.

# Finding So what
1 Cholesky: layout dominates algorithm. Row-major vs column-major → 28× variation in L1 misses. Banachiewicz vs Crout → <3%. If you're profiling Cholesky and only swapping algorithms, you're optimising the wrong axis.
2 Monte Carlo: sharp L1 phase transition. A 1,657× jump in L1 misses between portfolio dim d=50 and d=100 - coincides with the triangular factor crossing the 32 KB L1d boundary. Portfolio sizing decisions hide a hardware cliff. Pricing d=100 isn't 2× harder than d=50; it's two orders of magnitude harder.
3 GARCH: compute-bound despite cache misses. A 500× L1 miss rate increase costs only 3% throughput. The GARCH recurrence is serialised by loop-carried dependency. The cache is innocent - the dependency chain is the bottleneck.

Why this exists

Quant workflows get rewritten for speed constantly, but most "optimisation" is guesswork against an abstracted cost model. The hardware tells a different story. This repo is a small, reproducible argument for measuring before tuning - and for treating the L1 data cache as a first-class citizen in numerical finance.


Methodology

flowchart LR
    A[Kernel source<br/>C, PAPI-instrumented] --> B[Parametric build<br/>layout × algo × N]
    B --> C[Fixed-core execution<br/>Slurm · Big Red 200]
    C --> D[PAPI native event<br/>perf::L1-DCACHE-LOAD-MISSES]
    D --> E[results_*.csv<br/>260 + 36 + 24 + 21 configs]
    E --> F[plot_comparison.py<br/>publication figures]
Loading

Each kernel is compiled into multiple binaries (one per configuration of storage layout, algorithm variant, and problem size). Every run is pinned to a single EPYC core, counters are read at kernel boundaries, and results land in CSV for analysis.


Quick start

# On a system with PAPI installed
module load papi          # if using environment modules
cd src
make finance              # builds cholesky, mc_paths, garch, gemm variants

# Run a single benchmark
./bin/cholesky_ROW_MAJOR_ALGO_BANACHIEWICZ 1000
./bin/mc_paths_ROW_MAJOR 100 100000
./bin/garch_mle 10000 1000

# Full sweep (Slurm)
cd ../scripts
sbatch run_finance_kernels.sh

# Generate figures
python3 plot_comparison.py

Prerequisites - Linux · GCC 7.5+ · PAPI 7.2+ · Python 3 with matplotlib, pandas · Slurm (optional, for the full sweep).


Layout

.
├── src/
│   ├── Makefile              # Parametric build (layouts × algorithms)
│   ├── cholesky_papi.c       # Cholesky factorisation
│   ├── mc_paths_papi.c       # Correlated MC path generation
│   ├── garch_mle_papi.c      # GARCH(1,1) MLE via grid search
│   └── mm_papi.c             # Dense GEMM (validation benchmark)
├── scripts/
│   ├── run_finance_kernels.sh  # Slurm batch driver
│   └── plot_comparison.py      # Publication figure generator
└── data/
    ├── results.csv             # GEMM            (260 configs)
    ├── results_cholesky.csv    # Cholesky        (36)
    ├── results_mcpaths.csv     # MC paths        (24)
    └── results_garch.csv       # GARCH           (21)

Platform

CPU AMD EPYC 7742, 2.25 GHz (Zen 2)
L1d 32 KB per core · 8-way · 64 B lines
L2 512 KB per core
L3 256 MB shared
PAPI 7.2.0.1 · native event perf::L1-DCACHE-LOAD-MISSES
System Indiana University Big Red 200

Roadmap

  • Extend to L2 / L3 miss counters and bandwidth-bound regimes
  • Add roofline positioning per kernel configuration
  • Repeat the sweep on Intel Sapphire Rapids and compare microarchitectures
  • Publish the companion note / short paper

Cite

@misc{bathuri2026cache,
  author       = {Pradyot Bathuri},
  title        = {Cache-Aware Computation for Quantitative Finance Workloads on {AMD} {EPYC}},
  year         = {2026},
  institution  = {Indiana University Bloomington},
  howpublished = {\url{https://github.com/pbathuri/finance-cache-hpc}}
}

Part of a broader research line on HPC + quantitative finance.
See also Research_HPC_QFinance_Cache · @pbathuri

About

Empirical L1 cache characterisation of four quantitative finance kernels (Cholesky, Monte Carlo, GARCH, GEMM) on AMD EPYC using PAPI hardware counters

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors