epi-bench

Tooling for Energy Per Intelligence Research

Calculate EPI. Plot Pareto Frontiers. Log Power Traces. Run Benchmarks. Accept Community Data.

epi-bench is the shared tooling that powers all EPI research papers. It calculates EPI from power traces and benchmark results, generates publication-quality Pareto plots, and provides a standardized CSV format for community submissions.

1. Installation

From Source

git clone https://github.com/Franzabner/epi-bench.git
cd epi-bench
pip install -e ".[dev]"

Dependencies

python >= 3.10
numpy >= 1.26
matplotlib >= 3.8
pandas >= 2.1
paramiko >= 3.4        # SSH to Orchestrator Pi
pyserial >= 3.5        # Direct epi-meter UART (optional)

2. Quick Start

from epibench import EPICalculator, PowerTrace, BenchmarkResult

# Load power trace from epi-meter CSV
trace = PowerTrace.from_csv("power_trace_001.csv")

# Load benchmark results
bench = BenchmarkResult.from_json("benchmark_001.json")

# Calculate EPI
calc = EPICalculator()
result = calc.calculate(trace, bench)

print(f"EPI: {result.epi:.4f}")
print(f"J/Token: {result.joules_per_token:.4f}")
print(f"Accuracy: {result.accuracy_composite:.4f}")
print(f"Total Energy: {result.total_joules:.2f} J")
print(f"Total Tokens: {result.total_tokens}")

Command Line

# Calculate EPI from trace + benchmark files
epi-bench calculate --trace power_trace_001.csv --benchmark benchmark_001.json

# Plot Pareto frontier from multiple results
epi-bench pareto --results-dir ./data/expert-pruning/ --output pareto.png

# Monitor epi-meter in real-time (via Orchestrator SSH)
epi-bench monitor --host orchestrator.local --live

# Validate a community submission
epi-bench validate --submission ./data/community/user_pi5/

3. What's Inside

epibench/
├── calculators/
│   ├── epi.py              # Core EPI calculation: J/T ÷ A
│   ├── energy.py           # Energy integration from power traces
│   └── composite.py        # Composite accuracy scoring with weights
├── plotting/
│   ├── pareto.py           # Pareto frontier: accuracy vs J/token
│   ├── epi_bars.py         # EPI comparison bar charts
│   ├── power_timeline.py   # Per-node power draw over time
│   ├── surgery_curves.py   # EPI vs surgery parameter (experts dropped, etc.)
│   └── style.py            # YOSO-YAi dark theme for all plots
├── io/
│   ├── traces.py           # Power trace CSV reader/writer
│   ├── benchmarks.py       # Benchmark JSON reader/writer
│   ├── results.py          # EPI result JSON reader/writer
│   └── orchestrator.py     # SSH client for Orchestrator Pi SQLite
├── benchmarks/
│   ├── runner.py           # Benchmark suite runner (SSH to Pi cluster)
│   └── suites.py           # Benchmark suite definitions (MMLU, ARC, HSwag)
└── cli.py                  # Command-line interface

4. EPI Calculator

The core calculation module. Used by every EPI research paper.

Formula

EPI = J/T ÷ A

Where:
    J/T = E_total / N_tokens     (joules per token)
    A   = weighted composite accuracy (0–1)

Python API

from epibench.calculators.epi import EPICalculator
from epibench.calculators.composite import CompositeScorer

# Custom benchmark weights (default: equal 1/3 each)
scorer = CompositeScorer(weights={
    "mmlu": 0.4,
    "arc_challenge": 0.3,
    "hellaswag": 0.3,
})

calc = EPICalculator(scorer=scorer)
result = calc.calculate(trace, benchmark)

Result Object

@dataclass
class EPIResult:
    epi: float                  # The metric (lower is better)
    joules_per_token: float     # Energy numerator
    accuracy_composite: float   # Accuracy denominator
    total_joules: float         # Total energy consumed
    total_tokens: int           # Total tokens generated
    avg_watts: float            # Average cluster power
    peak_watts: float           # Peak cluster power
    duration_seconds: float     # Total inference time
    kwh: float                  # Energy in kWh (for cost context)
    model: str                  # Model identifier
    quantization: str           # Quantization method
    surgery: str                # Surgery applied
    hardware: str               # Hardware description
    benchmarks: dict            # Individual benchmark scores
    metadata: dict              # Run metadata (timestamps, firmware, etc.)

5. Power Trace Tools

Reading epi-meter CSV

from epibench.io.traces import PowerTrace

# From CSV file (epi-meter output logged by Orchestrator)
trace = PowerTrace.from_csv("power_trace_001.csv")

# Access data
print(f"Duration: {trace.duration_seconds:.1f} s")
print(f"Total energy: {trace.total_joules:.2f} J")
print(f"Avg power: {trace.avg_watts:.2f} W")
print(f"Peak power: {trace.peak_watts:.2f} W")
print(f"Samples: {trace.num_samples}")
print(f"Channels: {trace.num_channels}")

# Per-node energy
for node_id, joules in trace.per_node_joules.items():
    print(f"  Node {node_id}: {joules:.2f} J")

CSV Format

timestamp_ms,node_id,watts_rms,volts_rms,amps_rms,power_factor
0,0,12.450,121.300,0.103,0.9970
0,1,11.890,121.280,0.098,0.9965
0,2,12.110,121.310,0.100,0.9968
0,3,12.340,121.290,0.102,0.9971
1000,0,12.520,121.300,0.103,0.9970
...

Pulling from Orchestrator

from epibench.io.orchestrator import OrchestratorClient

# SSH to Orchestrator Pi and query SQLite
client = OrchestratorClient(host="orchestrator.local", user="pi")

# Get power trace for a specific time window
trace = client.get_power_trace(
    start_utc="2026-05-15T10:00:00Z",
    end_utc="2026-05-15T10:30:00Z",
)

# Export to CSV
trace.to_csv("power_trace_001.csv")

6. Benchmark Runner

Triggers the standardized benchmark suite on the Pi cluster and collects results.

from epibench.benchmarks.runner import BenchmarkRunner

runner = BenchmarkRunner(
    host="orchestrator.local",
    cluster_nodes=4,
    inference_engine="distributed-llama",
)

# Run benchmark suite
result = runner.run(
    model="qwen3-30b-a3b",
    quantization="Q4_K_M",
    suites=["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"],
)

# Save results
result.to_json("benchmark_001.json")

Benchmark Suites

Suite	Benchmark	Shots	Measures
`mmlu_5shot`	MMLU	5	Broad knowledge across 57 domains
`arc_challenge_25shot`	ARC-Challenge	25	Grade-school science reasoning
`hellaswag_10shot`	HellaSwag	10	Commonsense NLI

7. Pareto Plotter

Generates publication-quality Pareto frontier charts showing the tradeoff between accuracy and energy cost.

from epibench.plotting.pareto import ParetoPlotter

plotter = ParetoPlotter()

# Load multiple EPI results
plotter.add_results_from_dir("./data/expert-pruning/")

# Generate Pareto plot
plotter.plot(
    x="accuracy_composite",
    y="joules_per_token",
    labels="surgery",
    title="Expert Pruning: Accuracy vs Energy Cost",
    output="figures/pareto_expert_pruning.png",
)

Plot Types

Plot	Function	Use Case
`ParetoPlotter`	Accuracy vs J/Token scatter with Pareto frontier	Find optimal surgery configurations
`EPIBarChart`	EPI comparison bars	Compare models or surgery configs
`PowerTimeline`	Per-node watts over time	Visualize power draw during inference
`SurgeryCurve`	EPI vs surgery parameter	Find the knee in expert removal, head pruning, etc.

YOSO-YAi Plot Style

All plots use the YOSO-YAi dark theme:

from epibench.plotting.style import apply_yosoyai_theme

# Colors
GOLD = "#C9A84C"
BLACK = "#0A0A0A"
DARK_BG = "#1A1A1A"
LIGHT_TEXT = "#E0E0E0"

# Applied automatically to all epi-bench plots
# Dark background, gold accents, white text, high contrast

8. Results Logger

Writes complete EPI results to the standardized JSON format used across all research repos.

from epibench.io.results import ResultsLogger

logger = ResultsLogger(database_path="results.db")

# Log a result
logger.log(result, tags=["baseline", "qwen3-30b"])

# Query results
baselines = logger.query(tags=["baseline"])
best = logger.best_epi(model="qwen3-30b-a3b")

# Export to JSON (for publication in paper repo data/ directory)
logger.export_json(result, "data/baseline/qwen3-30b-a3b_q4km/epi_001.json")

9. Data Formats

Power Trace CSV

Column	Type	Unit	Description
`timestamp_ms`	int	ms	Milliseconds since recording start
`node_id`	int	—	Cluster node index (0–3)
`watts_rms`	float	W	True RMS real power
`volts_rms`	float	V	True RMS voltage
`amps_rms`	float	A	True RMS current
`power_factor`	float	—	Power factor (0.0–1.0)

Benchmark Result JSON

{
  "model": "qwen3-30b-a3b",
  "quantization": "Q4_K_M",
  "surgery": "none (baseline)",
  "benchmarks": {
    "mmlu_5shot": 0.000,
    "arc_challenge_25shot": 0.000,
    "hellaswag_10shot": 0.000
  },
  "total_tokens": 0,
  "duration_seconds": 0.0,
  "timestamp_utc": "2026-00-00T00:00:00Z",
  "hardware": "4x Pi 5 16GB (distributed-llama)",
  "inference_engine": "distributed-llama",
  "inference_engine_version": "0.0.0"
}

EPI Result JSON

{
  "epi": 0.0000,
  "joules_per_token": 0.0000,
  "accuracy_composite": 0.0000,
  "energy": {
    "total_joules": 0.00,
    "total_tokens": 0,
    "avg_watts": 0.00,
    "peak_watts": 0.00,
    "duration_seconds": 0.0,
    "kwh": 0.0000
  },
  "accuracy": {
    "mmlu_5shot": 0.000,
    "arc_challenge_25shot": 0.000,
    "hellaswag_10shot": 0.000,
    "composite": 0.000,
    "weights": {"mmlu": 0.333, "arc_challenge": 0.333, "hellaswag": 0.333}
  },
  "model": "model-name",
  "quantization": "Q4_K_M",
  "surgery": "description",
  "hardware": "4x Pi 5 16GB (distributed-llama)",
  "instrument": "epi-meter v1.0",
  "measurement_point": "AC inlet per node",
  "environment": {
    "ambient_temp_c": 22.0,
    "frequency_governor": "performance"
  },
  "run_metadata": {
    "run_id": "uuid",
    "timestamp_utc": "2026-00-00T00:00:00Z",
    "repetition": 1,
    "epi_meter_firmware": "0.1.0",
    "epi_bench_version": "0.1.0"
  }
}

10. Community Submissions

Submitting Your Data

Measure — Use an epi-meter or any calibrated AC power meter
Benchmark — Run the standardized suite using epi-bench
Calculate — Use epi-bench calculate to compute EPI
Validate — Run epi-bench validate on your submission directory
Submit — Open a PR to the relevant paper repo's data/community/ directory

Validation

# Validates file structure, JSON schema, value ranges, consistency
epi-bench validate --submission ./my_data/

# Output:
# ✓ Power trace CSV: valid (14,400 samples, 4 channels)
# ✓ Benchmark JSON: valid (3 benchmarks, scores in range)
# ✓ EPI JSON: valid (EPI = J/T ÷ A checks out)
# ✓ Metadata: complete
# ✓ Ready for submission

Standardized CSV for Cross-Study Comparison

model,quantization,surgery,hardware,instrument,epi,joules_per_token,accuracy,total_joules,total_tokens,contributor
qwen3-30b-a3b,Q4_K_M,none,4xPi5-16GB,epi-meter-v1,0.0000,0.0000,0.000,0.00,0,Franzabner

11. CLI Reference

epi-bench — Tooling for Energy Per Intelligence Research

USAGE:
    epi-bench <command> [options]

COMMANDS:
    calculate       Calculate EPI from power trace and benchmark results
    pareto          Generate Pareto frontier plot from multiple results
    bars            Generate EPI comparison bar chart
    timeline        Plot per-node power draw over time
    curve           Plot EPI vs surgery parameter
    monitor         Real-time epi-meter monitoring (SSH or serial)
    validate        Validate a community submission directory
    export          Export results to standardized CSV

OPTIONS:
    --help          Show help for any command
    --version       Show epi-bench version
    --theme dark    Use YOSO-YAi dark theme (default)
    --theme light   Use light theme for print

Examples

# Calculate EPI
epi-bench calculate \
    --trace data/power_trace_001.csv \
    --benchmark data/benchmark_001.json \
    --model "qwen3-30b-a3b" \
    --quantization "Q4_K_M" \
    --surgery "none (baseline)" \
    --output results/epi_001.json

# Pareto plot from directory of results
epi-bench pareto \
    --results-dir ./results/ \
    --title "Expert Pruning: Accuracy vs Energy" \
    --output figures/pareto.png

# EPI bar chart
epi-bench bars \
    --results-dir ./results/ \
    --sort-by epi \
    --output figures/epi_bars.png

# Power timeline
epi-bench timeline \
    --trace data/power_trace_001.csv \
    --output figures/power_timeline.png

# Surgery curve (EPI vs experts dropped)
epi-bench curve \
    --results-dir ./results/ \
    --x-param "experts_dropped" \
    --output figures/surgery_curve.png

# Live monitor
epi-bench monitor --host orchestrator.local --user pi --live

# Validate submission
epi-bench validate --submission ./data/community/my_submission/

12. Configuration

Config File (`epi-bench.toml`)

[general]
default_hardware = "4x Pi 5 16GB (distributed-llama)"
default_instrument = "epi-meter v1.0"
measurement_point = "AC inlet per node"

[orchestrator]
host = "orchestrator.local"
user = "pi"
db_path = "/home/pi/factory/telemetry.db"
power_table = "epi_power_traces"

[benchmarks]
suites = ["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"]
default_weights = { mmlu = 0.333, arc_challenge = 0.333, hellaswag = 0.333 }

[plotting]
theme = "dark"
dpi = 300
format = "png"
figsize = [12, 8]

[plotting.colors]
gold = "#C9A84C"
background = "#0A0A0A"
panel = "#1A1A1A"
text = "#E0E0E0"
grid = "#333333"

13. Examples

See examples/ for complete worked examples:

Example	Description
`basic_epi.py`	Calculate EPI from sample power trace and benchmark data
`pareto_plot.py`	Generate a Pareto frontier from multiple surgery results
`compare_models.py`	Bar chart comparing EPI across baseline models
`power_analysis.py`	Analyze per-node power distribution during inference

Note: Examples use placeholder data. Real measurements will be added when the YOSO-YAi FACTORY is operational (May 2026).

14. Related Repos

Repository	Role
`energy-per-intelligence`	Framework paper — defines EPI, baseline measurements
`epi-meter`	Open-source power measurement board (KiCad + firmware)
`expert-pruning-epi`	Paper: MoE expert removal measured by EPI
`mixed-quant-epi`	Paper: Per-layer quantization evaluated by EPI
`attention-head-surgery-epi`	Paper: Attention head removal and energy compensation

15. License

MIT License

EPI = Joules per Token / Task Accuracy

Lower is better. Measure everything. Trust nothing the device tells you about itself.

Francisco Abner — Electrical Engineer, CEO & Founder, YOSO-YAi LLC

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
docs/images		docs/images
epibench		epibench
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

epi-bench

Tooling for Energy Per Intelligence Research

Table of Contents

1. Installation

From Source

Dependencies

2. Quick Start

Command Line

3. What's Inside

4. EPI Calculator

Formula

Python API

Result Object

5. Power Trace Tools

Reading epi-meter CSV

CSV Format

Pulling from Orchestrator

6. Benchmark Runner

Benchmark Suites

7. Pareto Plotter

Plot Types

YOSO-YAi Plot Style

8. Results Logger

9. Data Formats

Power Trace CSV

Benchmark Result JSON

EPI Result JSON

10. Community Submissions

Submitting Your Data

Validation

Standardized CSV for Cross-Study Comparison

11. CLI Reference

Examples

12. Configuration

Config File (epi-bench.toml)

13. Examples

14. Related Repos

15. License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Config File (`epi-bench.toml`)

Packages