Skip to content

feat(benchmark): Add unified benchmark suite#166

Merged
m96-chan merged 5 commits intomainfrom
v0.2.18-benchmark-suite
Dec 30, 2025
Merged

feat(benchmark): Add unified benchmark suite#166
m96-chan merged 5 commits intomainfrom
v0.2.18-benchmark-suite

Conversation

@m96-chan
Copy link
Copy Markdown
Owner

@m96-chan m96-chan commented Dec 30, 2025

Summary

  • Add modular benchmark suite under src/pygpukit/benchmark/
  • Unified API for GEMM, GEMV, and attention benchmarks
  • JSON export and baseline comparison with regression detection
  • CLI interface: python -m pygpukit.benchmark

Features

BenchmarkSuite API

from pygpukit.benchmark import BenchmarkSuite

suite = BenchmarkSuite()
suite.add_gemm(sizes=[(4096, 4096, 4096)], dtypes=["bf16", "tf32"])
suite.add_gemv(dtypes=["bf16", "fp8", "nvf4"])
suite.add_attention(seq_lens=[512, 1024, 2048])

report = suite.run()
report.save("baseline.json")

# Compare with baseline
comparison = suite.compare("baseline.json", threshold=0.05)
if comparison.has_regression():
    raise RuntimeError("Regression detected!")

CLI Usage

python -m pygpukit.benchmark --quick
python -m pygpukit.benchmark --gemm --sizes 4096,8192
python -m pygpukit.benchmark --save results.json
python -m pygpukit.benchmark --compare baseline.json --fail-on-regression

Modules

File Description
__init__.py BenchmarkSuite class, exports
base.py Base Benchmark class, measure_kernel()
results.py BenchmarkResult, BenchmarkReport, ComparisonResult
gemm.py GEMMBenchmark, FP8GEMMBenchmark
gemv.py GEMVBenchmark, W8A8GEMVBenchmark
attention.py SDPABenchmark, GQABenchmark
cli.py CLI argument parsing and entry point

Test plan

  • Build package and verify import
  • Run python -m pygpukit.benchmark --quick
  • Run python -m pygpukit.benchmark --save test.json
  • Run python -m pygpukit.benchmark --compare test.json

Closes #163

🤖 Generated with Claude Code

m96-chan and others added 5 commits December 30, 2025 23:41
Add modular benchmark suite with:
- BenchmarkSuite class for unified benchmark orchestration
- GEMM benchmarks (fp32, tf32, bf16, fp16, fp8)
- GEMV benchmarks (bf16, fp8, nvf4, int4, w8a8)
- Attention benchmarks (SDPA, GQA)
- JSON export and baseline comparison
- Regression detection with configurable threshold
- CLI interface: python -m pygpukit.benchmark

Usage:
  from pygpukit.benchmark import BenchmarkSuite
  suite = BenchmarkSuite()
  suite.add_gemm().add_gemv()
  report = suite.run()
  report.save("baseline.json")

  # Compare with baseline
  comparison = suite.compare("baseline.json")
  if comparison.has_regression(threshold=0.05):
      raise RuntimeError("Regression detected!")

Closes #163

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Deleted:
- scripts/benchmark.py
- benchmarks/benchmark_gemv_*.py
- benchmarks/benchmark_nvf4_*.py
- benchmarks/benchmark_w8a16_gemm.py
- examples/benchmark_*.py

Use 'python -m pygpukit.benchmark' instead.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .claude/skills/benchmark/README.md
- Update CLAUDE.md to use 'python -m pygpukit.benchmark'
- Update PR checklist benchmark command

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@m96-chan m96-chan merged commit 71e6665 into main Dec 30, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(core): Benchmark Suite and Regression Testing

1 participant