Stockpile

Fast lookups for hundreds of millions of chess position evaluations from the Lichess database.

eval, err := client.Lookup(ctx, "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -")
fmt.Println(eval.Score()) // +0.20

Why?

Running Stockfish analysis is expensive. Analyzing a single game at depth 20+ takes several seconds of CPU time. Disco Chess is a chess training platform built around the Woodpecker Method — solving the same puzzles repeatedly until pattern recognition becomes automatic. Its review queue resurfaces mistakes from both training cycles and users' actual games, which means analyzing thousands of imported games to find missed tactics. That's a lot of Stockfish:

Scaling headaches: Spin up worker pools, manage job queues, handle bursty workloads (sound familiar?)
Slow feedback: Users wait minutes for game analysis to complete
High costs: CPU-intensive workloads don't come cheap

Stockpile sidesteps most of this. Lichess has already analyzed hundreds of millions of positions with Stockfish at depth 30+. Why redo work they've already done? Look up what exists, run Stockfish only for the gaps.

How?

Shard by material — Positions are distributed across 32K shards based on piece counts. Positions with similar material land in the same shard.
Sort by FEN — Within each shard, positions are sorted lexicographically. Lookups use binary search.
Cache hot shards — An LRU cache keeps frequently accessed shards in memory. Game analysis hits the same few shards repeatedly.

The key insight: consecutive positions in a chess game almost always have the same material (captures are rare). Material-based sharding keeps them together, maximizing cache hits.

Features

Fast lookups with LRU caching
Hundreds of millions of positions from Lichess Stockfish evaluations (depth 30+)
Pluggable storage: Local filesystem, GCS, S3
Material-based sharding for cache locality during game analysis
Zero external dependencies at runtime (all data self-contained)

Quick Start

Installation

go get github.com/discochess/stockpile

Build the Database

Download and process the Lichess evaluation database:

# Install CLI
go install github.com/discochess/stockpile/cmd/stockpile@latest

# Build from Lichess source (downloads ~17GB)
stockpile build --output ./data

# Or from a local file
stockpile build --source ./lichess_db_eval.jsonl.zst --output ./data

Build options:

Flag	Default	Description
`--output`	`./data`	Output directory for shards (local)
`--output-gcs`		GCS path for output (`gs://bucket/prefix`)
`--shards`	`32768`	Number of shards to create
`--strategy`	`material`	Sharding strategy: `material`, `fnv32`
`--workers`	`4`	Parallel workers for compression
`--max-memory`	`1024`	Max memory (MB) before spilling to disk

Memory note: The build process can be memory-intensive. If you experience OOM kills, lower --max-memory (e.g., --max-memory 512). For long builds, use caffeinate on macOS:

caffeinate -dims stockpile build --source ./lichess_db_eval.jsonl.zst --output ./data --workers 10

GCS output: For cloud deployments, build directly to GCS:

stockpile build --output-gcs gs://my-bucket/stockpile

This builds locally to a temp directory, then uploads to GCS. Suitable for monthly cronjobs to pick up new positions from Lichess.

Use the Library

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/discochess/stockpile"
)

func main() {
    // Create client with default settings (LRU cache, zstd decompression).
    opt, _ := stockpile.WithDataDir("./data")
    client, _ := stockpile.New(opt)
    defer client.Close()

    // Look up a position.
    ctx := context.Background()
    eval, err := client.Lookup(ctx, "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Score: %s\n", eval.Score())  // +0.20
    fmt.Printf("Depth: %d\n", eval.Depth)    // 36
    if pv := eval.BestPV(); pv != nil {
        fmt.Printf("PV: %s\n", pv.Line)      // e2e4 e7e5 g1f3
    }
}

For advanced configuration (custom cache size, cloud storage), see examples/.

CLI Usage

# Look up a position
stockpile lookup "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq -"

# Show database stats
stockpile stats --data-dir ./data

# Verify database integrity
stockpile verify --data-dir ./data

Architecture

Design Principles

Inspired by SSTables: sorted immutable files with binary search.

Simplicity over cleverness — Standard formats (JSONL, zstd) over custom binary formats
Pluggable components — Interfaces for storage, cache, and sharding strategy
Zero runtime dependencies — All data self-contained, no external services required

Data Flow

Build Phase:
  Lichess DB (.zst) → Decompress → Shard by Material → Sort by FEN → Compress

Lookup Phase:
  FEN → Compute Shard ID → Check Cache → Decompress (if miss) → Binary Search

Shard File Format

Each shard is a zstd-compressed JSONL file with lines sorted by FEN:

shards/
├── 00000.zst
├── 00001.zst
├── ...
└── 32767.zst

Each line matches the Lichess format:

{"fen":"rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq -","evals":[{"pvs":[{"cp":20,"line":"e7e5"}],"knodes":3000,"depth":36}]}

Material-Based Sharding

The default strategy encodes piece counts into a shard ID:

Piece Type	Bits	Range
White/Black Queens	3 each	0-7
White/Black Rooks	3 each	0-7
White/Black Minors (B+N)	3 each	0-7
Side to move	1	0-1

Total: 19 bits → modulo 32,768 shards.

This clusters positions with similar material together. Games progress through predictable material phases, so consecutive positions land in the same shard.

Binary Search

Lookup within a shard:

Decompress (zstd)
Binary search by FEN
Parse matching JSON line

FEN extraction during search avoids full JSON parsing—just a string search for "fen":".

Thread Safety

The Client is safe for concurrent use. Stats use atomic operations. Cache and store implementations handle concurrent access.

Sentinel Errors

var (
    ErrNotFound = errors.New("stockpile: position not found")
    ErrClosed   = errors.New("stockpile: client closed")
)

Benchmarking

Compare sharding strategies with real game data:

# Install benchmark CLI
go install github.com/discochess/stockpile/cmd/stockpile-bench@latest

# Run simulation with PGN games
stockpile-bench run --games games.pgn --strategies material,fnv32

# Generate markdown report
stockpile-bench run --games games.pgn --format markdown --output report.md --verbose

Storage Backends

Local Filesystem (default)

store, _ := diskstore.New("./data", zstdcodec.New())

Google Cloud Storage

store, _ := gcsstore.New(ctx, "my-bucket", gcsstore.WithPrefix("stockpile/"))

AWS S3

store, _ := s3store.New(ctx, "my-bucket", s3store.WithPrefix("stockpile/"))

Performance

Performance depends on storage backend, cache size, and access patterns. Warm cache lookups (shard already in memory) are fast. Cold lookups require decompression. Cloud storage adds network latency.

Run benchmarks on your hardware with stockpile-bench to measure actual performance.

Data Source

Evaluations come from the Lichess evaluation database:

Hundreds of millions of unique positions (and growing)
Stockfish 16+ at depth 30+
Updated monthly

Project Structure

stockpile/
├── cmd/
│   ├── stockpile/              # Main CLI (build, lookup, stats, verify)
│   └── stockpile-bench/        # Benchmark CLI
├── internal/
│   ├── builder/                # Database build pipeline
│   ├── codec/                  # Compression codecs (zstd, gzip, noop)
│   ├── search/                 # Binary search on sorted JSONL
│   ├── shard/                  # Sharding strategies
│   │   ├── materialshard/      # Material-based (default)
│   │   └── fnvshard/           # FNV32 hash
│   ├── stats/                  # Metrics collection
│   └── store/                  # Storage backends
│       ├── diskstore/          # Local filesystem
│       ├── gcsstore/           # Google Cloud Storage
│       ├── s3store/            # AWS S3
│       └── cachedstore/        # LRU caching wrapper
├── benchmark/                  # Benchmarking infrastructure
├── examples/                   # Example applications
└── fx/                         # Uber fx modules for DI

Fx Modules

For applications using Uber Fx:

import "github.com/discochess/stockpile/fx/diskstockpilefx"

fx.New(
    diskstockpilefx.Module,
    // ... your modules
)

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read the Architecture section above first.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
benchmark		benchmark
cmd		cmd
examples		examples
fx		fx
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
e2e_test.go		e2e_test.go
eval.go		eval.go
eval_test.go		eval_test.go
go.mod		go.mod
go.sum		go.sum
options.go		options.go
stockpile.go		stockpile.go
stockpile_test.go		stockpile_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stockpile

Why?

How?

Features

Quick Start

Installation

Build the Database

Use the Library

CLI Usage

Architecture

Design Principles

Data Flow

Shard File Format

Material-Based Sharding

Binary Search

Thread Safety

Sentinel Errors

Benchmarking

Storage Backends

Local Filesystem (default)

Google Cloud Storage

AWS S3

Performance

Data Source

Project Structure

Fx Modules

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stockpile

Why?

How?

Features

Quick Start

Installation

Build the Database

Use the Library

CLI Usage

Architecture

Design Principles

Data Flow

Shard File Format

Material-Based Sharding

Binary Search

Thread Safety

Sentinel Errors

Benchmarking

Storage Backends

Local Filesystem (default)

Google Cloud Storage

AWS S3

Performance

Data Source

Project Structure

Fx Modules

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages