Skip to content

rancorm/pesigitg

Repository files navigation

Pesigitg

Pesigitg /be·se·gitk/ (Mi'kmaq) — "a fork in a river"

What is it?

A high-performance QUIC-aware load balancer written in Rust, using eBPF and AF_XDP for kernel-bypass packet forwarding.

Ideation

I brainstorm this idea with Claude Projects (RFCs, project configs, snippets, etc.) to avoid the cold start problem when returning to an idea or concept and having to start over, grounding the answers to internal project knowledge.

This is an implementation of those concepts with the help of Claude Code.

Essentials

  • Small code size
  • Least amount of dependencies
  • Single programming language
  • Few items as possible to release
  • Behave like a traditional UNIX tool
  • Documentation for project and code history

Knowledge

QUIC & HTTP/3 related RFCs and drafts.

Core QUIC

HTTP/3

QUIC Extensions

MASQUE & Proxying

HTTP Datagrams

DNS over QUIC & HTTP/3

Web Enhancements

Drafts

Architecture

  • pesigitg-commonno_std-compatible library shared across crates. Contains compile-time constants (DEFAULT_PORT, DEFAULT_INTF, PID_DIR, MAX_CONFIG_SIZE), the current_pid helper, and the exit! macro. Standard-library-dependent code is gated behind the std feature flag.
  • pesigitg-daemon — The pesigitgd binary. Daemonizes via double-fork, manages a PID file, parses CLI arguments and an optional config file (key=value format), queries NIC hardware queue counts via ethtool ioctl, and integrates with systemd (sd_notify watchdog, READY=1, RELOADING=1). Logs to syslog when daemonized, or to stderr when running in the foreground or under systemd. Set PESIGITG_LOG_FORMAT=json to emit line-delimited JSON to stderr instead (suitable for Loki / Vector / Elastic agents).
  • pesigitg-ebpf — eBPF programs.

Development Prerequisites

Toolchain

Tool Install Notes
Rust (stable) rustup toolchain install stable Builds pesigitg-daemon and pesigitg-common
Rust (nightly) rustup toolchain install nightly Required for pesigitg-ebpf (-Z build-std=core)
rust-src component rustup component add rust-src --toolchain nightly Needed to cross-compile core for the BPF target
bpf-linker cargo +nightly install bpf-linker Links eBPF object files; uses rustc's bundled LLVM

System Packages (Debian/Ubuntu)

sudo apt install \
  build-essential \
  libelf-dev \
  libsystemd-dev \
  linux-headers-generic \
  pkg-config \
  rustup \
  zlib1g-dev
Package Why
build-essential C compiler, make, and libc headers (libc6-dev) needed by the libc and nix crates and the vendored libbpf build
libelf-dev ELF library headers required by the vendored libbpf build (libbpf-sys)
libsystemd-dev Required by the sd-notify crate for systemd integration
linux-headers-generic Kernel headers for netlink, ethtool ioctl, and XDP structures
pkg-config Locates system libraries (libelf, zlib, libsystemd) during cargo build
rustup Rust toolchain manager; provides rustup, cargo, and rustc
zlib1g-dev Compression library required by the vendored libbpf build (libbpf-sys)

Runtime Requirements

  • Linux kernel 5.8+ — AF_XDP socket support
  • AES-NI — the daemon checks for this CPU feature at startup and will refuse to run without it (Westmere / 2010+ x86_64 CPUs)
  • systemd (recommended) — pesigitgd uses Type=notify with watchdog; see contrib/etc/systemd/system/pesigitgd.service

Building

The workspace uses cargo-xtask to orchestrate multi-toolchain builds. No extra binaries to install — cargo xtask is a regular workspace member.

# build everything (eBPF program + daemon)
cargo xtask build --release

# build only the eBPF program
cargo xtask build-ebpf --release

cargo xtask build first compiles the eBPF program with the nightly toolchain (selected automatically via pesigitg-ebpf/rust-toolchain.toml), then builds the daemon with the stable toolchain, passing the eBPF object path through the PESIGITG_EBPF_OBJ environment variable.

Signals

Signal Effect
SIGHUP Reload daemon and route configuration, re-resolve server MACs, and reset health check backoff timers so all backends are re-probed on the next cycle.
SIGUSR1 Dump traffic statistics (packet counters, routing decisions) to the log.
SIGUSR2 Dump the full runtime config to the log: daemon args (interface, ports, queues, config paths), active config slots, per-server IP/MAC/health/drain status, fallback pool membership, and retry settings.
SIGINT / SIGTERM Graceful shutdown — stop all worker threads, then exit.
# reload config + force re-probe of unhealthy backends
kill -HUP $(pidof pesigitgd)

# inspect current traffic counters
kill -USR1 $(pidof pesigitgd)

# inspect runtime server state
kill -USR2 $(pidof pesigitgd)

Daemon Configuration

CLI flags and config-file keys are equivalent; CLI wins on conflict. The config file (-c/--config) uses key = value lines with # comments.

Flag Config key Default Description
-i, --interface <NAME> interface eth0 Data-plane interface to attach XDP to.
-p, --port <PORT> port UDP port to steer to user space. Repeat for multiple ports.
-q, --queues <NUM> queues 1 AF_XDP worker threads (one per NIC queue). Clamped to available CPU cores; workers are NUMA- and SMT-aware. See man 8 pesigitgd TUNING for the pairing rules and NIC sizing (ethtool -L).
-c, --config <PATH> Path to this daemon config file.
route_config /etc/pesigitg/lb.toml Route table (backends, CID encryption, optional QUIC Retry service). Relative paths resolve against the daemon config's directory.
-s, --status-socket <PATH> status_socket unset (disabled) Unix-domain socket for the JSON status API.
-f, --foreground false Don't daemonize; log to stderr. Implicit under systemd.

See contrib/etc/pesigitg/enp2s0f0.conf for an example.

Status API

When status_socket is set, the daemon exposes a read-only JSON API on a Unix-domain socket (mode 0660, root-owned). Access is gated by filesystem permissions — add trusted users to the socket's group if you want non-root reads.

Endpoint Response
GET / List of available endpoints.
GET /health Lock-free liveness probe: status (ok/degraded), uptime, and worker alive/expected counts. Safe to poll at high frequency.
GET /version Build identifiers (name, version, build_date, rustc_version, target). Static for the lifetime of the process — useful for detecting rolling restarts and binary drift across a fleet.
GET /stats Aggregated counters, per-retry breakdown, uptime.
GET /config Live daemon args and the full route table (encryption keys are never exposed — only the scheme name).

One request per connection. Under systemd the socket lives in /run/pesigitg/ (auto-created by RuntimeDirectory=); manual invocations create the parent directory on bind.

printf 'GET /health\n'  | sudo nc -U /run/pesigitg/status.sock
printf 'GET /version\n' | sudo nc -U /run/pesigitg/status.sock
printf 'GET /stats\n'   | sudo nc -U /run/pesigitg/status.sock
printf 'GET /config\n'  | sudo nc -U /run/pesigitg/status.sock

# Nagios/monit-friendly exit-code wrapper
printf 'GET /health\n' | sudo nc -U /run/pesigitg/status.sock \
  | jq -e '.status == "ok"' >/dev/null

# …or use the bundled helper:
sudo contrib/ok.sh        # prints "ok", exit 0 when healthy
sudo contrib/ok.sh -v     # same, but prints the full JSON

QUIC Retry

Pesigitg can offload QUIC address validation from backends to the LB. Every incoming Initial is classified before the CID fast path: if the client hasn't yet proved it owns its source IP, the LB replies with a Retry packet carrying a signed token, and the client must echo that token on its next Initial. A spoofed-source Initial flood is therefore absorbed at the LB instead of having N backends each pay the validation cost.

Tokens are HMAC-SHA256 over (source IP, original DCID, mint timestamp) using a 32-byte key. There is no per-connection state: a token is valid if it verifies against that tuple and hasn't aged past token_lifetime_secs.

Modes

Mode Behaviour
observe Full classify path runs and retry_* counters advance, but no Retry is ever emitted. Use this to sanity-check the parser before going live.
always Every Initial without a valid token is Retried.
load Retry engages only when the observed Initial rate reaches [retry.load] trigger_rate packets per second. The rate counter is a shared 1-second sliding window ticked by every worker; valid-token forwards bypass it so legitimate spikes don't self-trigger Retry.

Configuration

Retry lives in the route config (lb.toml), not the daemon config — it shares the RwLock swap with the rest of the route table on SIGHUP.

# lb.toml
[retry]
enabled      = true
token_key    = "0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20"
mode         = "load"                 # observe | always | load
token_lifetime_secs = 10              # 1..86400, default 10
ports        = [443, 8443]            # optional; empty = every daemon port

[retry.load]
trigger_rate = 50000                  # required iff mode = "load"

Without a [retry] section the classifier short-circuits at zero hot-path cost. See pesigitg-lb.toml(5) for the full key reference and validation rules.

Observability

GET /stats on the status socket returns a nested retry object: initials_seen, issued, token_validated, token_invalid, token_expired, parse_error. SIGUSR1 dumps the same counters to the log; SIGUSR2 dumps the live [retry] config (with the signing key redacted).

Network Configuration

Pesigitg uses Direct Server Return (DSR): the load balancer forwards packets to backends by rewriting only the L2, the destination IP stays as the VIP. Backends respond directly to clients, bypassing the LB on the return path.

Both the load balancer and the backend servers require ARP tuning to prevent Linux's default ARP behaviour ("ARP flux") from misdirecting traffic.

Load Balancer — management interface ARP

When the LB host has a management interface (e.g. eth0) in addition to the data-plane interface where XDP is attached, the kernel will, by default, answer ARP requests for the VIP on every interface — including the management NIC. The upstream router then caches the management interface's MAC for the VIP and sends traffic there. Because XDP only runs on the data-plane interface, these packets hit the kernel stack instead and are never forwarded to backends.

Set arp_ignore=1 on the management interface so ARP for the VIP is only answered on the data-plane NIC:

# apply immediately (replace eth0 with your management interface)
sudo sysctl -w net.ipv4.conf.eth0.arp_ignore=1
sudo sysctl -w net.ipv4.conf.eth0.arp_announce=2

# persist across reboots (see contrib/etc/sysctl.d/90-dsr.conf)
sudo cp contrib/etc/sysctl.d/90-dsr.conf /etc/sysctl.d/
sudo sysctl --system

Backends — suppress ARP for the VIP

Backend servers need the VIP on loopback to accept DSR packets. Without ARP suppression the backend answers ARP for the VIP on its physical NIC, the upstream router learns the backend's MAC for the VIP, and traffic bypasses the load balancer entirely.

# bind the VIP to loopback (see contrib/etc/netplan/99-dsr-vip.yaml)
sudo ip addr add 198.51.100.1/32 dev lo

# suppress ARP
sudo sysctl -w net.ipv4.conf.all.arp_ignore=1
sudo sysctl -w net.ipv4.conf.all.arp_announce=2

Sysctl reference

Sysctl Effect
arp_ignore=1 Only reply to ARP when the target IP is configured on the incoming interface.
arp_announce=2 Use the best local address for the outgoing interface as the ARP source, preventing the VIP from leaking into upstream ARP caches.

Per-interface knobs (e.g. net.ipv4.conf.eth0.arp_ignore) work too — Linux takes the maximum of conf.all and conf.<iface>. Use per-interface settings on the LB to leave the data-plane interface's ARP behaviour untouched; use conf.all on backends where every interface should be suppressed.

contrib/

Example configuration files, systemd units, and helper scripts.

File Description
etc/pesigitg/lb.toml Example route configuration defining CID encryption parameters and server-ID-to-address mappings. Documents both single-pass AES-ECB (when server_id_length + nonce_length = 16) and four-pass Feistel modes.
etc/pesigitg/enp2s0f0.conf Example daemon config file (key=value format) showing interface, port, queue count, and route_config pointer.
etc/sysctl.d/90-dsr.conf Sysctl ARP settings for DSR (load balancer and backends).
etc/sysctl.d/90-lb.conf Sysctl ARP settings for the load balancer management interface.
etc/netplan/99-dsr-vip.yaml Netplan configuration for VIP loopback addresses.
etc/systemd/system/pesigitgd.service Systemd Type=notify unit for running a single instance of pesigitgd.
etc/systemd/system/pesigitgd@.service Systemd template unit for per-interface instances — systemctl start pesigitgd@eth0 reads /etc/pesigitg/eth0.conf and binds the service lifetime to the network device.
run.sh Developer convenience script. Builds and runs the daemon under sudo via cargo xtask run. Accepts a build mode (release/debug, default release) and interface name (default eth0) as positional arguments.
dns-rr.sh Generates HTTPS DNS resource records (RFC 9460) for advertising HTTP/3 support. Supports IP hints, non-standard ports, ECH, --value-only output for DNS providers, and --query to look up existing records via dig.
dsr-backend.sh Installs/removes DSR backend configuration (sysctl + netplan VIPs) on a backend server.
ok.sh Liveness probe: queries /health on the status socket and exits 0 when status == "ok". Suitable for Nagios/monit/cron checks. Pass -v for the full JSON.

HTTPS DNS Records

HTTP DNS records (formally SVCB and HTTPS RR, defined in RFC 9460) are recent DNS record type that lets a domain advertise connection parameters directly in DNS, before the browser even makes a TCP or QUIC connection.

The Problem They Solve

Traditionally, connecting to a site involved a sequential chain:

DNS → TCP → TLS → HTTP response (with Alt-Svc header)

That means "this server supports HTTP/3" or "use this specific port" couldn't be discovered until deep into the connection process. HTTPS records collapse several of those round trips by putting that metadata into DNS itself.

How They Work

There are two types:

  • SVCB (Service Binding) the generic form, usable for any scheme.
  • HTTPS RR an SVCB variant for HTTPS, which is what browsers query.

An HTTPS record looks like this:

example.com.  300  IN  HTTPS  1 . alpn=h3,h2 ipv4hint=192.0.2.1 ipv6hint=2001:db8::1
  • Priority1 here. Priority 0 is a special "AliasMode" that works like a CNAME for HTTPS. Any non-zero value is "ServiceMode" carrying parameters.
  • Target. means "same domain." Could point elsewhere.
  • SvcParams — the key-value pairs carrying the useful metadata:
    • alpn — which application protocols are supported (h3, h2, http/1.1). This is the big one — if h3 is listed, the browser can attempt QUIC on the first connection without waiting for Alt-Svc.
    • ipv4hint / ipv6hint — IP addresses to try, saving an additional A/AAAA lookup.
    • port — if the service runs on a non-standard port.
    • ech — Encrypted Client Hello configuration, enabling TLS encryption of the SNI field for privacy.
    • no-default-alpn — indicates the server does not support default protocols, the client must use one of the listed ALPNs.

Glossary

Acronym Full Name Context
AES Advanced Encryption Standard Block cipher used for CID encryption
AES-NI AES New Instructions x86 CPU instruction set for hardware-accelerated AES; required at runtime
AF_XDP Address Family XDP User-space socket interface to XDP for kernel-bypass packet I/O
ARP Address Resolution Protocol IPv4 link-layer address resolution
BPF Berkeley Packet Filter In-kernel packet filtering VM; see eBPF
CID Connection ID QUIC connection identifier used for routing decisions
DCID Destination CID CID carried in incoming QUIC packets; used for server lookup
DNS Domain Name System Name resolution; HTTPS RR / SVCB records
DSR Direct Server Return Load-balancing mode where replies bypass the LB
eBPF extended BPF In-kernel virtual machine running the XDP packet-processing programs
ECB Electronic Code Book AES block cipher mode used in the Feistel-based CID encryption
ECH Encrypted Client Hello TLS extension that encrypts the SNI field for privacy
ECMP Equal-Cost Multi-Path Routing strategy that distributes flows across multiple next hops
ICMP Internet Control Message Protocol Error and diagnostic messages for IPv4
ICMPv6 ICMP for IPv6 Error and diagnostic messages for IPv6
IP Internet Protocol Network-layer protocol; both v4 and v6
L2 Layer 2 Data link layer (Ethernet frames, MAC addresses)
L3 Layer 3 Network layer (IP packets)
LLVM Low Level Virtual Machine Compiler infrastructure; used by bpf-linker for eBPF object files
MAC Media Access Control 48-bit hardware address on Ethernet interfaces
MTU Maximum Transmission Unit Largest packet size a link can carry
NAT Network Address Translation Client address/port rewriting; QUIC CID routing survives NAT rebinding
NDP Neighbor Discovery Protocol IPv6 link-layer address resolution (equivalent of ARP)
NIC Network Interface Card Physical or virtual network interface
NUMA Non-Uniform Memory Access CPU/memory topology; used for socket-aware thread placement
PID Process ID Unix process identifier; managed via PID file
QUIC Quick UDP Internet Connections UDP-based transport protocol; the primary protocol being load-balanced
QUIC-LB QUIC Load Balancing Specification for CID-based QUIC-aware load balancing
RSS Receive Side Scaling NIC feature that distributes incoming packets across hardware queues
RTT Round Trip Time Network latency measurement
RX Receive Incoming packet direction / receive queues
SCID Source Connection ID CID chosen by the server; encodes routing information
SIGHUP Signal Hang Up Unix signal used to trigger live config reload
SIGINT Signal Interrupt Unix signal sent by Ctrl+C
SIGTERM Signal Terminate Unix signal for graceful shutdown
SIGUSR1 User-defined Signal 1 Unix signal used to dump traffic statistics
SIGUSR2 User-defined Signal 2 Unix signal used to dump runtime config state
SNI Server Name Indication TLS extension carrying the target hostname
TLS Transport Layer Security Cryptographic protocol layered over TCP (or built into QUIC)
TTL Time To Live IPv4 header field limiting packet lifetime (hop count)
TX Transmit Outgoing packet direction / transmit queues
UMEM User Memory Shared memory region for AF_XDP packet buffers
VIP Virtual IP Frontend IP address exposed to clients by the load balancer
XDP eXpress Data Path Linux kernel hook for early, high-performance packet processing

About

A high-performance QUIC-aware load balancer written in Rust, using eBPF and AF_XDP for kernel-bypass packet forwarding.

Topics

Resources

License

GPL-3.0, Unknown licenses found

Licenses found

GPL-3.0
LICENSE
Unknown
LICENSE-COMMERCIAL.md

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors