Conversation
Summary: - remove reduce_sum variant plumbing (opinionated harness) - add reduce_sum kernel placeholder (sum only) - add reduce op kernel/ref/ops + tests + benchmarks - update benchmark runner + suites - update minimal docs Tests: - uv run pytest -q tests/test_reduce_sum.py tests/test_reduce.py Refs #23 Refs #29 Refs #20
- Modal remote benchmark execution on B200 GPUs - Strict GPU matching, cost warnings, and kernelbot-inspired patterns - bench/runner.py for serialization compatibility - Updated docs with Modal usage and cost warnings
## Summary - Adds `bench/modal_bench.py` for running benchmarks remotely on Modal GPUs - Uses strict GPU matching (`!` suffix) to prevent auto-upgrades (e.g., H100 → H200) - Adds runtime and documentation warnings about Modal GPU costs - Supports 12 GPU types: `any`, `b200`, `h200`, `h100`, `a100`, `a100-40gb`, `a100-80gb`, `l40s`, `a10`, `a10g`, `l4`, `t4` ## Warning > **This script incurs Modal GPU costs.** Review `bench/modal_bench.py` and verify timeout/GPU settings before running. Start with `--suite smoke` to validate your setup. You are responsible for any credits consumed. ## Changes - `bench/modal_bench.py`: Modal benchmark runner with strict GPU matching and cost warnings - `DEVELOPMENT.md`: Document supported GPUs, strict matching, and cost warning - `CHANGELOG.md`: Add entry for strict GPU matching - `README.md`: Modal setup instructions (in previous commits) ## Test plan - [ ] Run `modal run bench/modal_bench.py --suite smoke --gpu t4` to verify warning appears - [ ] Verify strict GPU matching prevents upgrades - [ ] Confirm benchmark results are saved correctly with `--out` Closes #28 🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Add _print_table() for human-readable benchmark output - Add reduce op to smoke suite - Add reduce_short and reduce_long shape-sweep suites - Fix _estimate_bytes to handle reduce op Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit 9eb0b84)
## Summary - add `FORGE_SOFTMAX_IMPL` mode selection to `softmax_online` (`auto`, `ref`, `kernel`) - in `kernel` mode, fail fast with clear errors when kernel module/entrypoints are missing - keep `auto` mode contributor-friendly by falling back to the reference implementation - add tests for impl-mode behavior and invalid mode handling - add `--impl` to `bench/benchmark_online_softmax.py` and remove the hard N divisibility assertion - make `bench/run.py` skip softmax cases cleanly when strict kernel mode is unavailable - document softmax impl mode usage in `DEVELOPMENT.md` ## Validation - `uv run ruff check forge_cute_py/ops/softmax_online.py tests/test_softmax_online.py bench/benchmark_online_softmax.py bench/run.py` - `uv run ruff format forge_cute_py/ops/softmax_online.py tests/test_softmax_online.py bench/benchmark_online_softmax.py bench/run.py` - `uv run pytest tests/test_softmax_online.py -q` - `uv run pytest -q` - `uv run python bench/benchmark_online_softmax.py --m-sizes 64 --n-sizes 256 --dtypes float16 --warmup 2 --iterations 5 --impl auto` - `uv run python bench/benchmark_online_softmax.py --m-sizes 64 --n-sizes 256 --dtypes float16 --warmup 2 --iterations 5 --impl kernel` - `FORGE_SOFTMAX_IMPL=kernel uv run python bench/run.py --suite smoke --op softmax_online` ## Notes - includes cherry-picked benchmark script commit authored by `jonah <jsamost@gmail.com>` (`a9d4983`) - follow-up docs-only patch will be opened separately
## Summary - fix stale docs references to removed `CLAUDE.md` - correct public API wording in `DEVELOPMENT.md` (`forge_cute_py.ops.<op>()`) - update README benchmark quick reference with softmax benchmark commands and impl-mode notes - clarify current release state in `CONTRIBUTING.md` (`v0.1.0-rc1` exists; final `v0.1.0` pending) - add changelog bullets for softmax impl-mode selection and benchmark CLI updates ## Scope - docs-only patch (`README.md`, `DEVELOPMENT.md`, `ROADMAP.md`, `CHANGELOG.md`, `CONTRIBUTING.md`) - no runtime code changes ## Notes - follow-up to #39
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sync main with the current init-harness baseline.