feat(api): add matmul(use_tf32=True) and DeviceCapabilities#46
Merged
feat(api): add matmul(use_tf32=True) and DeviceCapabilities#46
Conversation
## Python API - Add `use_tf32` keyword argument to `matmul()` function - `None` (default): Use PYGPUKIT_ALLOW_TF32 env var - `True`: Force TF32 TensorCore path - `False`: Force FP32 path - Add `get_device_capabilities()` function - Export `DeviceCapabilities` and `KernelType` from Rust ## Rust (pygpukit-core) - Add `device` module with: - `KernelType` enum (FP32_FMA, TF32_MMA, FP16_MMA, etc.) - `DeviceCapabilities` struct with tensorcore detection - Add `best_matmul_kernel()` for kernel selection ## C++ Backend - Add `matmul_tf32()` function with explicit TF32 control - Proper error messages for unsupported configs ## Tests (TDD) - Add `tests/test_tf32_api.py` with comprehensive tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check TF32 dtype requirement before backend dispatch to ensure RuntimeError is raised even in CPU fallback mode (CI without GPU). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
use_tf32keyword argument tomatmul()function for explicit TF32 controlDeviceCapabilitiesandKernelTypetypes from Rust backendget_device_capabilities()function for querying GPU capabilitiesPython API Changes
New Rust Types
KernelType: Enum with FP32_FMA, TF32_MMA, FP16_MMA, BF16_MMA, L2_NAIVE, TILED_SMEMDeviceCapabilities: Device info including tensorcore, sm_version, async_copyTest plan
tests/test_tf32_api.py🤖 Generated with Claude Code