feat(benchmark): Add unified benchmark suite by m96-chan · Pull Request #166 · m96-chan/PyGPUkit

m96-chan · 2025-12-30T14:42:12Z

Summary

Add modular benchmark suite under src/pygpukit/benchmark/
Unified API for GEMM, GEMV, and attention benchmarks
JSON export and baseline comparison with regression detection
CLI interface: python -m pygpukit.benchmark

Features

BenchmarkSuite API

from pygpukit.benchmark import BenchmarkSuite

suite = BenchmarkSuite()
suite.add_gemm(sizes=[(4096, 4096, 4096)], dtypes=["bf16", "tf32"])
suite.add_gemv(dtypes=["bf16", "fp8", "nvf4"])
suite.add_attention(seq_lens=[512, 1024, 2048])

report = suite.run()
report.save("baseline.json")

# Compare with baseline
comparison = suite.compare("baseline.json", threshold=0.05)
if comparison.has_regression():
    raise RuntimeError("Regression detected!")

CLI Usage

python -m pygpukit.benchmark --quick
python -m pygpukit.benchmark --gemm --sizes 4096,8192
python -m pygpukit.benchmark --save results.json
python -m pygpukit.benchmark --compare baseline.json --fail-on-regression

Modules

File	Description
`__init__.py`	BenchmarkSuite class, exports
`base.py`	Base Benchmark class, measure_kernel()
`results.py`	BenchmarkResult, BenchmarkReport, ComparisonResult
`gemm.py`	GEMMBenchmark, FP8GEMMBenchmark
`gemv.py`	GEMVBenchmark, W8A8GEMVBenchmark
`attention.py`	SDPABenchmark, GQABenchmark
`cli.py`	CLI argument parsing and entry point

Test plan

Build package and verify import
Run python -m pygpukit.benchmark --quick
Run python -m pygpukit.benchmark --save test.json
Run python -m pygpukit.benchmark --compare test.json

Closes #163

🤖 Generated with Claude Code

Add modular benchmark suite with: - BenchmarkSuite class for unified benchmark orchestration - GEMM benchmarks (fp32, tf32, bf16, fp16, fp8) - GEMV benchmarks (bf16, fp8, nvf4, int4, w8a8) - Attention benchmarks (SDPA, GQA) - JSON export and baseline comparison - Regression detection with configurable threshold - CLI interface: python -m pygpukit.benchmark Usage: from pygpukit.benchmark import BenchmarkSuite suite = BenchmarkSuite() suite.add_gemm().add_gemv() report = suite.run() report.save("baseline.json") # Compare with baseline comparison = suite.compare("baseline.json") if comparison.has_regression(threshold=0.05): raise RuntimeError("Regression detected!") Closes #163 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Deleted: - scripts/benchmark.py - benchmarks/benchmark_gemv_*.py - benchmarks/benchmark_nvf4_*.py - benchmarks/benchmark_w8a16_gemm.py - examples/benchmark_*.py Use 'python -m pygpukit.benchmark' instead. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add .claude/skills/benchmark/README.md - Update CLAUDE.md to use 'python -m pygpukit.benchmark' - Update PR checklist benchmark command 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

m96-chan and others added 5 commits December 30, 2025 23:41

fix(benchmark): correct GEMV B matrix layout [N,K]

4e71a98

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix(benchmark): fix lint errors - unused imports and ambiguous var names

1cb4342

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

m96-chan merged commit 71e6665 into main Dec 30, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmark): Add unified benchmark suite#166

feat(benchmark): Add unified benchmark suite#166
m96-chan merged 5 commits intomainfrom
v0.2.18-benchmark-suite

m96-chan commented Dec 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

m96-chan commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

BenchmarkSuite API

CLI Usage

Modules

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

m96-chan commented Dec 30, 2025 •

edited

Loading