Skip to content

feat(api): add matmul(use_tf32=True) and DeviceCapabilities#46

Merged
m96-chan merged 2 commits intomainfrom
feature/v0.2.3-tf32-api
Dec 14, 2025
Merged

feat(api): add matmul(use_tf32=True) and DeviceCapabilities#46
m96-chan merged 2 commits intomainfrom
feature/v0.2.3-tf32-api

Conversation

@m96-chan
Copy link
Copy Markdown
Owner

Summary

  • Add use_tf32 keyword argument to matmul() function for explicit TF32 control
  • Add DeviceCapabilities and KernelType types from Rust backend
  • Add get_device_capabilities() function for querying GPU capabilities

Python API Changes

# Explicit TF32 control
C = gp.matmul(A, B, use_tf32=True)   # Force TF32 TensorCore
C = gp.matmul(A, B, use_tf32=False)  # Force FP32
C = gp.matmul(A, B)                   # Use PYGPUKIT_ALLOW_TF32 env var

# Query device capabilities
caps = gp.get_device_capabilities()
print(caps.tensorcore)      # True on SM >= 80
print(caps.sm_version)      # e.g., 86 for RTX 3090

New Rust Types

  • KernelType: Enum with FP32_FMA, TF32_MMA, FP16_MMA, BF16_MMA, L2_NAIVE, TILED_SMEM
  • DeviceCapabilities: Device info including tensorcore, sm_version, async_copy

Test plan

  • TDD: Tests written first in tests/test_tf32_api.py
  • Rust tests pass (114 tests)
  • Python lint passes (ruff)
  • CI tests

🤖 Generated with Claude Code

m96-chan and others added 2 commits December 14, 2025 14:14
## Python API
- Add `use_tf32` keyword argument to `matmul()` function
  - `None` (default): Use PYGPUKIT_ALLOW_TF32 env var
  - `True`: Force TF32 TensorCore path
  - `False`: Force FP32 path
- Add `get_device_capabilities()` function
- Export `DeviceCapabilities` and `KernelType` from Rust

## Rust (pygpukit-core)
- Add `device` module with:
  - `KernelType` enum (FP32_FMA, TF32_MMA, FP16_MMA, etc.)
  - `DeviceCapabilities` struct with tensorcore detection
- Add `best_matmul_kernel()` for kernel selection

## C++ Backend
- Add `matmul_tf32()` function with explicit TF32 control
- Proper error messages for unsupported configs

## Tests (TDD)
- Add `tests/test_tf32_api.py` with comprehensive tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Check TF32 dtype requirement before backend dispatch to ensure
RuntimeError is raised even in CPU fallback mode (CI without GPU).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@m96-chan m96-chan merged commit cd85e30 into main Dec 14, 2025
13 checks passed
@m96-chan m96-chan deleted the feature/v0.2.3-tf32-api branch December 26, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant