Skip to content

discochess/cloud-stockfish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cloud-Optimized Stockfish Docker Images

Build Docker images for Stockfish chess engine optimized for modern cloud infrastructure. By targeting specific CPU instruction sets available on cloud VMs, you ensure Stockfish runs at peak efficiency for your hardware.

These build patterns power the analysis engine at Disco Chess, where Stockfish is used to:

  • Analyze users' games imported from Lichess and Chess.com to detect missed tactical opportunities
  • Provide real-time board analysis during game review

What this repo is: A Dockerfile and build script that compiles Stockfish with architecture-specific optimizations. That's it. You bring your own integration (HTTP wrapper, job queue, etc.).

Why Cloud-Optimized Builds?

Generic Stockfish binaries target the lowest common denominator CPU to maximize compatibility. Cloud providers (AWS, GCP, Azure) run modern CPUs with advanced instruction sets that Stockfish can leverage:

What These Instructions Do

BMI2 (Bit Manipulation Instructions 2): Chess engines represent the board as 64-bit numbers called "bitboards" - one bit per square. BMI2 includes special instructions (PEXT/PDEP) that can extract and rearrange bits in a single CPU cycle, making move generation and attack detection faster. Without BMI2, the engine must use multiple operations to achieve the same result.

POPCNT (Population Count): Counts how many bits are set to 1 in a number. In chess terms: "how many pieces are on this diagonal?" or "how many squares can this piece attack?" A single instruction replaces what would otherwise be a loop.

AVX2/NEON (Vector Instructions): Process multiple numbers simultaneously. Used heavily by Stockfish's NNUE neural network evaluation.

Measured Performance

The actual performance gain from BMI2 over a generic x86-64 build is typically 5-10%, varying by specific CPU and workload. This is a modest but consistent improvement.

The larger gains come from:

  1. Using newer cloud VM families (e.g., GCP c3d vs c3 can show ~25% difference)
  2. Multi-threading efficiency
  3. Proper hash table sizing

Note: Run stockfish with the bench command on your target hardware to get accurate numbers for your specific configuration.

Quick Start

Build Locally

# Build for your architecture
./build.sh

# Run
docker run --rm stockfish-optimized stockfish

Supported Architectures

Architecture CPU Feature Minimum CPU
linux/amd64 x86-64-bmi2 Haswell (2013+)
linux/arm64 armv8 Apple M1, Graviton2+

CPU Feature Detection

Check if your system supports BMI2:

# Linux
grep -q bmi2 /proc/cpuinfo && echo "BMI2 supported" || echo "BMI2 not supported"

# macOS (Apple Silicon always supports equivalent)
sysctl -n machdep.cpu.features | grep -i bmi2

Cloud Provider CPU Generations

Provider Instance Type BMI2 Support
AWS c5, m5, r5+ Yes
AWS t3, t3a Yes
AWS Graviton2/3 (arm64) N/A (armv8)
GCP n2, c2, e2 Yes
Azure Dv3, Ev3+ Yes

Build Customization

Custom Stockfish Version

# Build specific version
docker build --build-arg SF_VERSION=sf_17 -t stockfish:17 .

Architecture-Specific

# AMD64 only (most servers)
docker buildx build --platform linux/amd64 -t stockfish:amd64 .

# ARM64 only (Graviton, Apple Silicon)
docker buildx build --platform linux/arm64 -t stockfish:arm64 .

Performance Tuning

Thread Configuration

Stockfish scales well with cores. Configure threads based on your container resources:

# 4 threads, 256MB hash
docker run --rm stockfish-optimized stockfish <<< "
setoption name Threads value 4
setoption name Hash value 256
position startpos
go depth 25
quit"

Memory (Hash Table)

Rule of thumb: 1GB hash per 4-8 threads for optimal cache efficiency.

Threads Recommended Hash
1-2 128MB
4 256MB
8 512MB-1GB
16+ 2GB+

Benchmarking

Run the Built-in Benchmark

docker run --rm stockfish-optimized stockfish <<< "bench"

This runs a standardized test and reports nodes/second.

Compare Builds

To measure the actual gain from optimized builds:

  1. Build a generic x86-64 image (change ARCH=x86-64 in Dockerfile)
  2. Build the optimized x86-64-bmi2 image
  3. Run bench on both and compare nodes/sec

Expected Gains

Based on community benchmarks:

  • BMI2 vs generic x86-64: ~5-10% improvement (varies by CPU)
  • Newer cloud VM families: Can be significant (see GCP benchmarks)
  • Thread scaling: Near-linear up to physical core count

CPU Compatibility

Intel

Generation Year BMI2 Recommended Build
Sandy Bridge 2011 No x86-64
Ivy Bridge 2012 No x86-64
Haswell 2013 Yes x86-64-bmi2
Skylake+ 2015+ Yes x86-64-bmi2

AMD

Generation Year BMI2 Recommended Build
Piledriver 2012 No x86-64
Excavator 2015 Slow x86-64-modern
Zen 2017 Yes x86-64-bmi2
Zen 3+ 2020+ Yes x86-64-bmi2

Note: AMD Excavator has slow PEXT/PDEP; use x86-64-modern instead.

ARM

Processor Recommended Build
Apple M1/M2/M3 armv8
AWS Graviton2/3 armv8

About

Docker builds for Stockfish with BMI2/ARM optimizations for cloud VMs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors