epi-bench is the shared tooling that powers all EPI research papers. It calculates EPI from power traces and benchmark results, generates publication-quality Pareto plots, and provides a standardized CSV format for community submissions.
- Installation
- Quick Start
- What's Inside
- EPI Calculator
- Power Trace Tools
- Benchmark Runner
- Pareto Plotter
- Results Logger
- Data Formats
- Community Submissions
- CLI Reference
- Configuration
- Examples
- Related Repos
- License
git clone https://github.com/Franzabner/epi-bench.git
cd epi-bench
pip install -e ".[dev]"python >= 3.10
numpy >= 1.26
matplotlib >= 3.8
pandas >= 2.1
paramiko >= 3.4 # SSH to Orchestrator Pi
pyserial >= 3.5 # Direct epi-meter UART (optional)
from epibench import EPICalculator, PowerTrace, BenchmarkResult
# Load power trace from epi-meter CSV
trace = PowerTrace.from_csv("power_trace_001.csv")
# Load benchmark results
bench = BenchmarkResult.from_json("benchmark_001.json")
# Calculate EPI
calc = EPICalculator()
result = calc.calculate(trace, bench)
print(f"EPI: {result.epi:.4f}")
print(f"J/Token: {result.joules_per_token:.4f}")
print(f"Accuracy: {result.accuracy_composite:.4f}")
print(f"Total Energy: {result.total_joules:.2f} J")
print(f"Total Tokens: {result.total_tokens}")# Calculate EPI from trace + benchmark files
epi-bench calculate --trace power_trace_001.csv --benchmark benchmark_001.json
# Plot Pareto frontier from multiple results
epi-bench pareto --results-dir ./data/expert-pruning/ --output pareto.png
# Monitor epi-meter in real-time (via Orchestrator SSH)
epi-bench monitor --host orchestrator.local --live
# Validate a community submission
epi-bench validate --submission ./data/community/user_pi5/epibench/
├── calculators/
│ ├── epi.py # Core EPI calculation: J/T ÷ A
│ ├── energy.py # Energy integration from power traces
│ └── composite.py # Composite accuracy scoring with weights
├── plotting/
│ ├── pareto.py # Pareto frontier: accuracy vs J/token
│ ├── epi_bars.py # EPI comparison bar charts
│ ├── power_timeline.py # Per-node power draw over time
│ ├── surgery_curves.py # EPI vs surgery parameter (experts dropped, etc.)
│ └── style.py # YOSO-YAi dark theme for all plots
├── io/
│ ├── traces.py # Power trace CSV reader/writer
│ ├── benchmarks.py # Benchmark JSON reader/writer
│ ├── results.py # EPI result JSON reader/writer
│ └── orchestrator.py # SSH client for Orchestrator Pi SQLite
├── benchmarks/
│ ├── runner.py # Benchmark suite runner (SSH to Pi cluster)
│ └── suites.py # Benchmark suite definitions (MMLU, ARC, HSwag)
└── cli.py # Command-line interface
The core calculation module. Used by every EPI research paper.
EPI = J/T ÷ A
Where:
J/T = E_total / N_tokens (joules per token)
A = weighted composite accuracy (0–1)
from epibench.calculators.epi import EPICalculator
from epibench.calculators.composite import CompositeScorer
# Custom benchmark weights (default: equal 1/3 each)
scorer = CompositeScorer(weights={
"mmlu": 0.4,
"arc_challenge": 0.3,
"hellaswag": 0.3,
})
calc = EPICalculator(scorer=scorer)
result = calc.calculate(trace, benchmark)@dataclass
class EPIResult:
epi: float # The metric (lower is better)
joules_per_token: float # Energy numerator
accuracy_composite: float # Accuracy denominator
total_joules: float # Total energy consumed
total_tokens: int # Total tokens generated
avg_watts: float # Average cluster power
peak_watts: float # Peak cluster power
duration_seconds: float # Total inference time
kwh: float # Energy in kWh (for cost context)
model: str # Model identifier
quantization: str # Quantization method
surgery: str # Surgery applied
hardware: str # Hardware description
benchmarks: dict # Individual benchmark scores
metadata: dict # Run metadata (timestamps, firmware, etc.)from epibench.io.traces import PowerTrace
# From CSV file (epi-meter output logged by Orchestrator)
trace = PowerTrace.from_csv("power_trace_001.csv")
# Access data
print(f"Duration: {trace.duration_seconds:.1f} s")
print(f"Total energy: {trace.total_joules:.2f} J")
print(f"Avg power: {trace.avg_watts:.2f} W")
print(f"Peak power: {trace.peak_watts:.2f} W")
print(f"Samples: {trace.num_samples}")
print(f"Channels: {trace.num_channels}")
# Per-node energy
for node_id, joules in trace.per_node_joules.items():
print(f" Node {node_id}: {joules:.2f} J")timestamp_ms,node_id,watts_rms,volts_rms,amps_rms,power_factor
0,0,12.450,121.300,0.103,0.9970
0,1,11.890,121.280,0.098,0.9965
0,2,12.110,121.310,0.100,0.9968
0,3,12.340,121.290,0.102,0.9971
1000,0,12.520,121.300,0.103,0.9970
...from epibench.io.orchestrator import OrchestratorClient
# SSH to Orchestrator Pi and query SQLite
client = OrchestratorClient(host="orchestrator.local", user="pi")
# Get power trace for a specific time window
trace = client.get_power_trace(
start_utc="2026-05-15T10:00:00Z",
end_utc="2026-05-15T10:30:00Z",
)
# Export to CSV
trace.to_csv("power_trace_001.csv")Triggers the standardized benchmark suite on the Pi cluster and collects results.
from epibench.benchmarks.runner import BenchmarkRunner
runner = BenchmarkRunner(
host="orchestrator.local",
cluster_nodes=4,
inference_engine="distributed-llama",
)
# Run benchmark suite
result = runner.run(
model="qwen3-30b-a3b",
quantization="Q4_K_M",
suites=["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"],
)
# Save results
result.to_json("benchmark_001.json")| Suite | Benchmark | Shots | Measures |
|---|---|---|---|
mmlu_5shot |
MMLU | 5 | Broad knowledge across 57 domains |
arc_challenge_25shot |
ARC-Challenge | 25 | Grade-school science reasoning |
hellaswag_10shot |
HellaSwag | 10 | Commonsense NLI |
Generates publication-quality Pareto frontier charts showing the tradeoff between accuracy and energy cost.
from epibench.plotting.pareto import ParetoPlotter
plotter = ParetoPlotter()
# Load multiple EPI results
plotter.add_results_from_dir("./data/expert-pruning/")
# Generate Pareto plot
plotter.plot(
x="accuracy_composite",
y="joules_per_token",
labels="surgery",
title="Expert Pruning: Accuracy vs Energy Cost",
output="figures/pareto_expert_pruning.png",
)| Plot | Function | Use Case |
|---|---|---|
ParetoPlotter |
Accuracy vs J/Token scatter with Pareto frontier | Find optimal surgery configurations |
EPIBarChart |
EPI comparison bars | Compare models or surgery configs |
PowerTimeline |
Per-node watts over time | Visualize power draw during inference |
SurgeryCurve |
EPI vs surgery parameter | Find the knee in expert removal, head pruning, etc. |
All plots use the YOSO-YAi dark theme:
from epibench.plotting.style import apply_yosoyai_theme
# Colors
GOLD = "#C9A84C"
BLACK = "#0A0A0A"
DARK_BG = "#1A1A1A"
LIGHT_TEXT = "#E0E0E0"
# Applied automatically to all epi-bench plots
# Dark background, gold accents, white text, high contrastWrites complete EPI results to the standardized JSON format used across all research repos.
from epibench.io.results import ResultsLogger
logger = ResultsLogger(database_path="results.db")
# Log a result
logger.log(result, tags=["baseline", "qwen3-30b"])
# Query results
baselines = logger.query(tags=["baseline"])
best = logger.best_epi(model="qwen3-30b-a3b")
# Export to JSON (for publication in paper repo data/ directory)
logger.export_json(result, "data/baseline/qwen3-30b-a3b_q4km/epi_001.json")| Column | Type | Unit | Description |
|---|---|---|---|
timestamp_ms |
int | ms | Milliseconds since recording start |
node_id |
int | — | Cluster node index (0–3) |
watts_rms |
float | W | True RMS real power |
volts_rms |
float | V | True RMS voltage |
amps_rms |
float | A | True RMS current |
power_factor |
float | — | Power factor (0.0–1.0) |
{
"model": "qwen3-30b-a3b",
"quantization": "Q4_K_M",
"surgery": "none (baseline)",
"benchmarks": {
"mmlu_5shot": 0.000,
"arc_challenge_25shot": 0.000,
"hellaswag_10shot": 0.000
},
"total_tokens": 0,
"duration_seconds": 0.0,
"timestamp_utc": "2026-00-00T00:00:00Z",
"hardware": "4x Pi 5 16GB (distributed-llama)",
"inference_engine": "distributed-llama",
"inference_engine_version": "0.0.0"
}{
"epi": 0.0000,
"joules_per_token": 0.0000,
"accuracy_composite": 0.0000,
"energy": {
"total_joules": 0.00,
"total_tokens": 0,
"avg_watts": 0.00,
"peak_watts": 0.00,
"duration_seconds": 0.0,
"kwh": 0.0000
},
"accuracy": {
"mmlu_5shot": 0.000,
"arc_challenge_25shot": 0.000,
"hellaswag_10shot": 0.000,
"composite": 0.000,
"weights": {"mmlu": 0.333, "arc_challenge": 0.333, "hellaswag": 0.333}
},
"model": "model-name",
"quantization": "Q4_K_M",
"surgery": "description",
"hardware": "4x Pi 5 16GB (distributed-llama)",
"instrument": "epi-meter v1.0",
"measurement_point": "AC inlet per node",
"environment": {
"ambient_temp_c": 22.0,
"frequency_governor": "performance"
},
"run_metadata": {
"run_id": "uuid",
"timestamp_utc": "2026-00-00T00:00:00Z",
"repetition": 1,
"epi_meter_firmware": "0.1.0",
"epi_bench_version": "0.1.0"
}
}- Measure — Use an epi-meter or any calibrated AC power meter
- Benchmark — Run the standardized suite using epi-bench
- Calculate — Use
epi-bench calculateto compute EPI - Validate — Run
epi-bench validateon your submission directory - Submit — Open a PR to the relevant paper repo's
data/community/directory
# Validates file structure, JSON schema, value ranges, consistency
epi-bench validate --submission ./my_data/
# Output:
# ✓ Power trace CSV: valid (14,400 samples, 4 channels)
# ✓ Benchmark JSON: valid (3 benchmarks, scores in range)
# ✓ EPI JSON: valid (EPI = J/T ÷ A checks out)
# ✓ Metadata: complete
# ✓ Ready for submissionmodel,quantization,surgery,hardware,instrument,epi,joules_per_token,accuracy,total_joules,total_tokens,contributor
qwen3-30b-a3b,Q4_K_M,none,4xPi5-16GB,epi-meter-v1,0.0000,0.0000,0.000,0.00,0,Franzabnerepi-bench — Tooling for Energy Per Intelligence Research
USAGE:
epi-bench <command> [options]
COMMANDS:
calculate Calculate EPI from power trace and benchmark results
pareto Generate Pareto frontier plot from multiple results
bars Generate EPI comparison bar chart
timeline Plot per-node power draw over time
curve Plot EPI vs surgery parameter
monitor Real-time epi-meter monitoring (SSH or serial)
validate Validate a community submission directory
export Export results to standardized CSV
OPTIONS:
--help Show help for any command
--version Show epi-bench version
--theme dark Use YOSO-YAi dark theme (default)
--theme light Use light theme for print
# Calculate EPI
epi-bench calculate \
--trace data/power_trace_001.csv \
--benchmark data/benchmark_001.json \
--model "qwen3-30b-a3b" \
--quantization "Q4_K_M" \
--surgery "none (baseline)" \
--output results/epi_001.json
# Pareto plot from directory of results
epi-bench pareto \
--results-dir ./results/ \
--title "Expert Pruning: Accuracy vs Energy" \
--output figures/pareto.png
# EPI bar chart
epi-bench bars \
--results-dir ./results/ \
--sort-by epi \
--output figures/epi_bars.png
# Power timeline
epi-bench timeline \
--trace data/power_trace_001.csv \
--output figures/power_timeline.png
# Surgery curve (EPI vs experts dropped)
epi-bench curve \
--results-dir ./results/ \
--x-param "experts_dropped" \
--output figures/surgery_curve.png
# Live monitor
epi-bench monitor --host orchestrator.local --user pi --live
# Validate submission
epi-bench validate --submission ./data/community/my_submission/[general]
default_hardware = "4x Pi 5 16GB (distributed-llama)"
default_instrument = "epi-meter v1.0"
measurement_point = "AC inlet per node"
[orchestrator]
host = "orchestrator.local"
user = "pi"
db_path = "/home/pi/factory/telemetry.db"
power_table = "epi_power_traces"
[benchmarks]
suites = ["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"]
default_weights = { mmlu = 0.333, arc_challenge = 0.333, hellaswag = 0.333 }
[plotting]
theme = "dark"
dpi = 300
format = "png"
figsize = [12, 8]
[plotting.colors]
gold = "#C9A84C"
background = "#0A0A0A"
panel = "#1A1A1A"
text = "#E0E0E0"
grid = "#333333"See examples/ for complete worked examples:
| Example | Description |
|---|---|
basic_epi.py |
Calculate EPI from sample power trace and benchmark data |
pareto_plot.py |
Generate a Pareto frontier from multiple surgery results |
compare_models.py |
Bar chart comparing EPI across baseline models |
power_analysis.py |
Analyze per-node power distribution during inference |
Note: Examples use placeholder data. Real measurements will be added when the YOSO-YAi FACTORY is operational (May 2026).
| Repository | Role |
|---|---|
energy-per-intelligence |
Framework paper — defines EPI, baseline measurements |
epi-meter |
Open-source power measurement board (KiCad + firmware) |
expert-pruning-epi |
Paper: MoE expert removal measured by EPI |
mixed-quant-epi |
Paper: Per-layer quantization evaluated by EPI |
attention-head-surgery-epi |
Paper: Attention head removal and energy compensation |
