Skip to content

Franzabner/epi-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOSO-YAi

epi-bench

Tooling for Energy Per Intelligence Research

Calculate EPI. Plot Pareto Frontiers. Log Power Traces. Run Benchmarks. Accept Community Data.

Status License Python Paper Instrument


epi-bench is the shared tooling that powers all EPI research papers. It calculates EPI from power traces and benchmark results, generates publication-quality Pareto plots, and provides a standardized CSV format for community submissions.


Table of Contents

  1. Installation
  2. Quick Start
  3. What's Inside
  4. EPI Calculator
  5. Power Trace Tools
  6. Benchmark Runner
  7. Pareto Plotter
  8. Results Logger
  9. Data Formats
  10. Community Submissions
  11. CLI Reference
  12. Configuration
  13. Examples
  14. Related Repos
  15. License

1. Installation

From Source

git clone https://github.com/Franzabner/epi-bench.git
cd epi-bench
pip install -e ".[dev]"

Dependencies

python >= 3.10
numpy >= 1.26
matplotlib >= 3.8
pandas >= 2.1
paramiko >= 3.4        # SSH to Orchestrator Pi
pyserial >= 3.5        # Direct epi-meter UART (optional)

2. Quick Start

from epibench import EPICalculator, PowerTrace, BenchmarkResult

# Load power trace from epi-meter CSV
trace = PowerTrace.from_csv("power_trace_001.csv")

# Load benchmark results
bench = BenchmarkResult.from_json("benchmark_001.json")

# Calculate EPI
calc = EPICalculator()
result = calc.calculate(trace, bench)

print(f"EPI: {result.epi:.4f}")
print(f"J/Token: {result.joules_per_token:.4f}")
print(f"Accuracy: {result.accuracy_composite:.4f}")
print(f"Total Energy: {result.total_joules:.2f} J")
print(f"Total Tokens: {result.total_tokens}")

Command Line

# Calculate EPI from trace + benchmark files
epi-bench calculate --trace power_trace_001.csv --benchmark benchmark_001.json

# Plot Pareto frontier from multiple results
epi-bench pareto --results-dir ./data/expert-pruning/ --output pareto.png

# Monitor epi-meter in real-time (via Orchestrator SSH)
epi-bench monitor --host orchestrator.local --live

# Validate a community submission
epi-bench validate --submission ./data/community/user_pi5/

3. What's Inside

epibench/
├── calculators/
│   ├── epi.py              # Core EPI calculation: J/T ÷ A
│   ├── energy.py           # Energy integration from power traces
│   └── composite.py        # Composite accuracy scoring with weights
├── plotting/
│   ├── pareto.py           # Pareto frontier: accuracy vs J/token
│   ├── epi_bars.py         # EPI comparison bar charts
│   ├── power_timeline.py   # Per-node power draw over time
│   ├── surgery_curves.py   # EPI vs surgery parameter (experts dropped, etc.)
│   └── style.py            # YOSO-YAi dark theme for all plots
├── io/
│   ├── traces.py           # Power trace CSV reader/writer
│   ├── benchmarks.py       # Benchmark JSON reader/writer
│   ├── results.py          # EPI result JSON reader/writer
│   └── orchestrator.py     # SSH client for Orchestrator Pi SQLite
├── benchmarks/
│   ├── runner.py           # Benchmark suite runner (SSH to Pi cluster)
│   └── suites.py           # Benchmark suite definitions (MMLU, ARC, HSwag)
└── cli.py                  # Command-line interface

4. EPI Calculator

The core calculation module. Used by every EPI research paper.

Formula

EPI = J/T ÷ A

Where:
    J/T = E_total / N_tokens     (joules per token)
    A   = weighted composite accuracy (0–1)

Python API

from epibench.calculators.epi import EPICalculator
from epibench.calculators.composite import CompositeScorer

# Custom benchmark weights (default: equal 1/3 each)
scorer = CompositeScorer(weights={
    "mmlu": 0.4,
    "arc_challenge": 0.3,
    "hellaswag": 0.3,
})

calc = EPICalculator(scorer=scorer)
result = calc.calculate(trace, benchmark)

Result Object

@dataclass
class EPIResult:
    epi: float                  # The metric (lower is better)
    joules_per_token: float     # Energy numerator
    accuracy_composite: float   # Accuracy denominator
    total_joules: float         # Total energy consumed
    total_tokens: int           # Total tokens generated
    avg_watts: float            # Average cluster power
    peak_watts: float           # Peak cluster power
    duration_seconds: float     # Total inference time
    kwh: float                  # Energy in kWh (for cost context)
    model: str                  # Model identifier
    quantization: str           # Quantization method
    surgery: str                # Surgery applied
    hardware: str               # Hardware description
    benchmarks: dict            # Individual benchmark scores
    metadata: dict              # Run metadata (timestamps, firmware, etc.)

5. Power Trace Tools

Reading epi-meter CSV

from epibench.io.traces import PowerTrace

# From CSV file (epi-meter output logged by Orchestrator)
trace = PowerTrace.from_csv("power_trace_001.csv")

# Access data
print(f"Duration: {trace.duration_seconds:.1f} s")
print(f"Total energy: {trace.total_joules:.2f} J")
print(f"Avg power: {trace.avg_watts:.2f} W")
print(f"Peak power: {trace.peak_watts:.2f} W")
print(f"Samples: {trace.num_samples}")
print(f"Channels: {trace.num_channels}")

# Per-node energy
for node_id, joules in trace.per_node_joules.items():
    print(f"  Node {node_id}: {joules:.2f} J")

CSV Format

timestamp_ms,node_id,watts_rms,volts_rms,amps_rms,power_factor
0,0,12.450,121.300,0.103,0.9970
0,1,11.890,121.280,0.098,0.9965
0,2,12.110,121.310,0.100,0.9968
0,3,12.340,121.290,0.102,0.9971
1000,0,12.520,121.300,0.103,0.9970
...

Pulling from Orchestrator

from epibench.io.orchestrator import OrchestratorClient

# SSH to Orchestrator Pi and query SQLite
client = OrchestratorClient(host="orchestrator.local", user="pi")

# Get power trace for a specific time window
trace = client.get_power_trace(
    start_utc="2026-05-15T10:00:00Z",
    end_utc="2026-05-15T10:30:00Z",
)

# Export to CSV
trace.to_csv("power_trace_001.csv")

6. Benchmark Runner

Triggers the standardized benchmark suite on the Pi cluster and collects results.

from epibench.benchmarks.runner import BenchmarkRunner

runner = BenchmarkRunner(
    host="orchestrator.local",
    cluster_nodes=4,
    inference_engine="distributed-llama",
)

# Run benchmark suite
result = runner.run(
    model="qwen3-30b-a3b",
    quantization="Q4_K_M",
    suites=["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"],
)

# Save results
result.to_json("benchmark_001.json")

Benchmark Suites

Suite Benchmark Shots Measures
mmlu_5shot MMLU 5 Broad knowledge across 57 domains
arc_challenge_25shot ARC-Challenge 25 Grade-school science reasoning
hellaswag_10shot HellaSwag 10 Commonsense NLI

7. Pareto Plotter

Generates publication-quality Pareto frontier charts showing the tradeoff between accuracy and energy cost.

from epibench.plotting.pareto import ParetoPlotter

plotter = ParetoPlotter()

# Load multiple EPI results
plotter.add_results_from_dir("./data/expert-pruning/")

# Generate Pareto plot
plotter.plot(
    x="accuracy_composite",
    y="joules_per_token",
    labels="surgery",
    title="Expert Pruning: Accuracy vs Energy Cost",
    output="figures/pareto_expert_pruning.png",
)

Plot Types

Plot Function Use Case
ParetoPlotter Accuracy vs J/Token scatter with Pareto frontier Find optimal surgery configurations
EPIBarChart EPI comparison bars Compare models or surgery configs
PowerTimeline Per-node watts over time Visualize power draw during inference
SurgeryCurve EPI vs surgery parameter Find the knee in expert removal, head pruning, etc.

YOSO-YAi Plot Style

All plots use the YOSO-YAi dark theme:

from epibench.plotting.style import apply_yosoyai_theme

# Colors
GOLD = "#C9A84C"
BLACK = "#0A0A0A"
DARK_BG = "#1A1A1A"
LIGHT_TEXT = "#E0E0E0"

# Applied automatically to all epi-bench plots
# Dark background, gold accents, white text, high contrast

8. Results Logger

Writes complete EPI results to the standardized JSON format used across all research repos.

from epibench.io.results import ResultsLogger

logger = ResultsLogger(database_path="results.db")

# Log a result
logger.log(result, tags=["baseline", "qwen3-30b"])

# Query results
baselines = logger.query(tags=["baseline"])
best = logger.best_epi(model="qwen3-30b-a3b")

# Export to JSON (for publication in paper repo data/ directory)
logger.export_json(result, "data/baseline/qwen3-30b-a3b_q4km/epi_001.json")

9. Data Formats

Power Trace CSV

Column Type Unit Description
timestamp_ms int ms Milliseconds since recording start
node_id int Cluster node index (0–3)
watts_rms float W True RMS real power
volts_rms float V True RMS voltage
amps_rms float A True RMS current
power_factor float Power factor (0.0–1.0)

Benchmark Result JSON

{
  "model": "qwen3-30b-a3b",
  "quantization": "Q4_K_M",
  "surgery": "none (baseline)",
  "benchmarks": {
    "mmlu_5shot": 0.000,
    "arc_challenge_25shot": 0.000,
    "hellaswag_10shot": 0.000
  },
  "total_tokens": 0,
  "duration_seconds": 0.0,
  "timestamp_utc": "2026-00-00T00:00:00Z",
  "hardware": "4x Pi 5 16GB (distributed-llama)",
  "inference_engine": "distributed-llama",
  "inference_engine_version": "0.0.0"
}

EPI Result JSON

{
  "epi": 0.0000,
  "joules_per_token": 0.0000,
  "accuracy_composite": 0.0000,
  "energy": {
    "total_joules": 0.00,
    "total_tokens": 0,
    "avg_watts": 0.00,
    "peak_watts": 0.00,
    "duration_seconds": 0.0,
    "kwh": 0.0000
  },
  "accuracy": {
    "mmlu_5shot": 0.000,
    "arc_challenge_25shot": 0.000,
    "hellaswag_10shot": 0.000,
    "composite": 0.000,
    "weights": {"mmlu": 0.333, "arc_challenge": 0.333, "hellaswag": 0.333}
  },
  "model": "model-name",
  "quantization": "Q4_K_M",
  "surgery": "description",
  "hardware": "4x Pi 5 16GB (distributed-llama)",
  "instrument": "epi-meter v1.0",
  "measurement_point": "AC inlet per node",
  "environment": {
    "ambient_temp_c": 22.0,
    "frequency_governor": "performance"
  },
  "run_metadata": {
    "run_id": "uuid",
    "timestamp_utc": "2026-00-00T00:00:00Z",
    "repetition": 1,
    "epi_meter_firmware": "0.1.0",
    "epi_bench_version": "0.1.0"
  }
}

10. Community Submissions

Submitting Your Data

  1. Measure — Use an epi-meter or any calibrated AC power meter
  2. Benchmark — Run the standardized suite using epi-bench
  3. Calculate — Use epi-bench calculate to compute EPI
  4. Validate — Run epi-bench validate on your submission directory
  5. Submit — Open a PR to the relevant paper repo's data/community/ directory

Validation

# Validates file structure, JSON schema, value ranges, consistency
epi-bench validate --submission ./my_data/

# Output:
# ✓ Power trace CSV: valid (14,400 samples, 4 channels)
# ✓ Benchmark JSON: valid (3 benchmarks, scores in range)
# ✓ EPI JSON: valid (EPI = J/T ÷ A checks out)
# ✓ Metadata: complete
# ✓ Ready for submission

Standardized CSV for Cross-Study Comparison

model,quantization,surgery,hardware,instrument,epi,joules_per_token,accuracy,total_joules,total_tokens,contributor
qwen3-30b-a3b,Q4_K_M,none,4xPi5-16GB,epi-meter-v1,0.0000,0.0000,0.000,0.00,0,Franzabner

11. CLI Reference

epi-bench — Tooling for Energy Per Intelligence Research

USAGE:
    epi-bench <command> [options]

COMMANDS:
    calculate       Calculate EPI from power trace and benchmark results
    pareto          Generate Pareto frontier plot from multiple results
    bars            Generate EPI comparison bar chart
    timeline        Plot per-node power draw over time
    curve           Plot EPI vs surgery parameter
    monitor         Real-time epi-meter monitoring (SSH or serial)
    validate        Validate a community submission directory
    export          Export results to standardized CSV

OPTIONS:
    --help          Show help for any command
    --version       Show epi-bench version
    --theme dark    Use YOSO-YAi dark theme (default)
    --theme light   Use light theme for print

Examples

# Calculate EPI
epi-bench calculate \
    --trace data/power_trace_001.csv \
    --benchmark data/benchmark_001.json \
    --model "qwen3-30b-a3b" \
    --quantization "Q4_K_M" \
    --surgery "none (baseline)" \
    --output results/epi_001.json

# Pareto plot from directory of results
epi-bench pareto \
    --results-dir ./results/ \
    --title "Expert Pruning: Accuracy vs Energy" \
    --output figures/pareto.png

# EPI bar chart
epi-bench bars \
    --results-dir ./results/ \
    --sort-by epi \
    --output figures/epi_bars.png

# Power timeline
epi-bench timeline \
    --trace data/power_trace_001.csv \
    --output figures/power_timeline.png

# Surgery curve (EPI vs experts dropped)
epi-bench curve \
    --results-dir ./results/ \
    --x-param "experts_dropped" \
    --output figures/surgery_curve.png

# Live monitor
epi-bench monitor --host orchestrator.local --user pi --live

# Validate submission
epi-bench validate --submission ./data/community/my_submission/

12. Configuration

Config File (epi-bench.toml)

[general]
default_hardware = "4x Pi 5 16GB (distributed-llama)"
default_instrument = "epi-meter v1.0"
measurement_point = "AC inlet per node"

[orchestrator]
host = "orchestrator.local"
user = "pi"
db_path = "/home/pi/factory/telemetry.db"
power_table = "epi_power_traces"

[benchmarks]
suites = ["mmlu_5shot", "arc_challenge_25shot", "hellaswag_10shot"]
default_weights = { mmlu = 0.333, arc_challenge = 0.333, hellaswag = 0.333 }

[plotting]
theme = "dark"
dpi = 300
format = "png"
figsize = [12, 8]

[plotting.colors]
gold = "#C9A84C"
background = "#0A0A0A"
panel = "#1A1A1A"
text = "#E0E0E0"
grid = "#333333"

13. Examples

See examples/ for complete worked examples:

Example Description
basic_epi.py Calculate EPI from sample power trace and benchmark data
pareto_plot.py Generate a Pareto frontier from multiple surgery results
compare_models.py Bar chart comparing EPI across baseline models
power_analysis.py Analyze per-node power distribution during inference

Note: Examples use placeholder data. Real measurements will be added when the YOSO-YAi FACTORY is operational (May 2026).


14. Related Repos

Repository Role
energy-per-intelligence Framework paper — defines EPI, baseline measurements
epi-meter Open-source power measurement board (KiCad + firmware)
expert-pruning-epi Paper: MoE expert removal measured by EPI
mixed-quant-epi Paper: Per-layer quantization evaluated by EPI
attention-head-surgery-epi Paper: Attention head removal and energy compensation

15. License

MIT License


EPI = Joules per Token / Task Accuracy

Lower is better. Measure everything. Trust nothing the device tells you about itself.

YOSO-YAi

Francisco Abner — Electrical Engineer, CEO & Founder, YOSO-YAi LLC

About

Tooling for Energy Per Intelligence research — calculate EPI, plot Pareto frontiers, log power traces, run benchmarks. Python 3.10+.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages