Skip to content

PyGPUkit v0.2.3 — Reliability Phase + TF32 TensorCore #42

@m96-chan

Description

@m96-chan

PyGPUkit v0.2.3 — Reliability Phase + TF32 TensorCore

PyGPUkit v0.2.3 focuses on stability, reproducibility, large-scale correctness, and introduces the first version of Tensor Core acceleration to meet the 22 TFLOPS performance target on Ampere GPUs (RTX 3090 Ti / A100).

✔️ Core Reliability Features

1. Kernel Cache (LRU) – Completion

  • Persistent kernel selection cache
  • Architecture-aware kernel fingerprinting (SM, register file, shared mem)
  • LRU eviction policy with max-size limit

2. Driver-only Mode Stabilization

  • NVRTC error handling & retry logic
  • JIT warm-up cache
  • Fallback path for mismatched PTX ISA

3. Cross-Platform Support (Windows + Linux)

  • Uniform CMake configuration
  • CUDA Driver API path resolver
  • os.add_dll_directory on Windows

4. Large GPU Memory Stress Test

  • Continuous alloc/free 16GB loop
  • Fragmentation measurement API
  • Memory pool corruption detection

⚡ New Additions for v0.2.3 (Tensor Core Roadmap)

5. TF32 TensorCore GEMM (Ampere+) — Phase 1

Goal: 22–30 TFLOPS on RTX 3090 Ti

Deliverables:

  • Tensor Core WMMA API (TF32 input → FP32 accumulate)
  • Kernel dispatcher:
    • FP32 FMA fallback
    • TF32 Tensor Core (if SM ≥ 80)
  • Unit tests (1e-3 tolerance)
  • Performance test (4096×4096, 8192×8192)

6. Architecture Scaling (3090 Ti → A100 → H100)

  • TF32 kernel parameter adaptation
  • Shared memory / register tuning per SM count
  • High-end GPU detection logic

🎯 Performance Target

GPU Target TFLOPS Notes
RTX 3090 Ti 22–30 TFLOPS cuBLAS-equivalent
A100 40–60 TFLOPS TF32 native
H100 80+ TFLOPS BF16 path later

🧩 Final v0.2.3 Structure

v0.2.3
├─ Reliability Core
│ ├─ Kernel Cache LRU
│ ├─ Driver-only Stabilization
│ ├─ Cross-platform Support
│ └─ Large Memory Fragmentation Test

└─ Tensor Core Line
├─ TF32 TensorCore GEMM (22–30 TFLOPS)
└─ Architecture Scaling (3090Ti → A100 → H100)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions