Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 57 additions & 16 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,34 +35,75 @@ The core scheduling, memory management, GPU coordination, and performance-critic
```
PyGPUkit/
├── src/pygpukit/ # Python API (NumPy-compatible)
│ ├── core/ # GPUArray, backend abstraction
│ ├── ops/ # GPU operations (matmul, nn, audio, etc.)
│ ├── llm/ # LLM inference (Qwen, LLaMA)
│ ├── core/ # Core abstractions
│ │ ├── array.py # GPUArray implementation
│ │ ├── backend.py # Backend detection/initialization
│ │ ├── memory.py # Memory utilities (copy, sync)
│ │ └── stream.py # CUDA Stream wrapper
│ ├── ops/ # GPU operations (modular packages)
│ │ ├── matmul/ # Matrix multiplication
│ │ │ ├── gemm/ # GEMM operations (M > 1)
│ │ │ └── gemv/ # GEMV operations (M = 1)
│ │ ├── nn/ # Neural network ops
│ │ │ ├── activation.py # GELU, SiLU, etc.
│ │ │ ├── attention.py # SDPA, paged attention
│ │ │ ├── norm.py # RMSNorm, LayerNorm
│ │ │ └── rope.py # Rotary position embedding
│ │ └── audio/ # Audio processing
│ │ ├── transforms/ # FFT, Mel spectrogram
│ │ └── analysis/ # Pitch, onset detection
│ ├── llm/ # LLM inference (modular)
│ │ ├── models/ # Model implementations
│ │ └── sampling/ # Token sampling strategies
│ └── asr/ # Speech recognition (Whisper)
│ ├── preprocessing.py # Audio preprocessing (mel, normalize)
│ └── whisper/ # Whisper model implementation
│ ├── config.py # WhisperConfig
│ ├── loader.py # SafeTensors loader
│ ├── encoder.py # Whisper encoder
│ ├── decoder.py # Whisper decoder
│ └── model.py # WhisperModel high-level API
│ │ │ └── causal_transformer.py
│ │ ├── layers/ # Layer types
│ │ │ ├── attention.py # Multi-head attention
│ │ │ ├── ffn.py # Feed-forward networks
│ │ │ ├── norm.py # Normalization layers
│ │ │ ├── embedding.py # Token/position embeddings
│ │ │ └── recurrent.py # LSTM, Mamba
│ │ ├── decode/ # Decoding strategies
│ │ ├── loader/ # Model loading
│ │ │ ├── safetensors.py # SafeTensors loader
│ │ │ └── tokenizer.py # Tokenizer wrapper
│ │ └── quantization/ # Quantization utilities
│ │ ├── config.py # Quant configs
│ │ └── repack.py # Weight repacking
│ ├── asr/ # Speech recognition (Whisper)
│ │ └── whisper/ # Whisper model implementation
│ └── tts/ # Text-to-speech (Kokoro)
│ └── kokoro/ # Kokoro TTS model
├── native/
│ ├── core/ # C++ (CUDA Runtime/Driver API)
│ ├── jit/ # C++ (NVRTC)
│ ├── ops/ # C++ (CUDA kernels)
│ │ └── matmul/ # MatMul kernels (see below)
│ └── bindings/ # pybind11
│ │ ├── matmul/ # MatMul kernels (see below)
│ │ │ ├── matmul.cu # Main dispatcher
│ │ │ ├── fused.cu # Fused ops (linear+bias+GELU)
│ │ │ └── batched.cu # Batched GEMM
│ │ ├── nn/ # Neural network ops
│ │ │ ├── activation/ # Activation functions
│ │ │ ├── attention/ # Attention kernels
│ │ │ ├── norm/ # Normalization kernels
│ │ │ ├── rope/ # RoPE kernels
│ │ │ └── recurrent/ # LSTM/Mamba kernels
│ │ └── audio/ # Audio processing kernels
│ └── bindings/ # pybind11 (modular)
│ ├── gemm/ # GEMM bindings by dtype
│ ├── gemv/ # GEMV bindings by dtype
│ └── nn/ # NN operation bindings
├── rust/
│ ├── pygpukit-core/ # Pure Rust GPU runtime
│ │ └── src/
│ │ ├── memory/ # MemoryPool, LRU, size-class allocator
│ │ ├── scheduler/ # Task state machine, QoS policies
│ │ └── device.rs # DeviceCapabilities, KernelType
│ └── pygpukit-python/ # PyO3 bindings
├── examples/
├── benchmarks/ # Performance benchmarks
├── examples/ # Example scripts (organized)
│ ├── benchmarks/ # Performance benchmarks
│ ├── chat/ # Chat CLI applications
│ ├── demos/ # Feature demos
│ │ └── archived/ # Version-specific demos (historical)
│ └── demo_*.py # Current feature demos
└── tests/
```

Expand Down
82 changes: 63 additions & 19 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,87 @@
# PyGPUkit Examples

## Directory Structure

```
examples/
├── benchmarks/ # Performance benchmarks
├── chat/ # Chat CLI applications
├── demos/archived/ # Version-specific demos (historical)
├── demo_*.py # Current feature demos
├── tts.py # Text-to-speech example
└── whisper_realtime_stt.py # Speech-to-text example
```

## Requirements

- NVIDIA GPU with CUDA support
- CUDA Toolkit 12.x
- NVIDIA GPU with SM >= 80 (Ampere or newer)
- CUDA Toolkit 12.x or 13.x
- Built native module (`_pygpukit_native`)

## Examples
## Quick Start

### demo_gpu.py
Basic GPU operations demo using the native C++ backend directly.
### Chat CLI

```bash
# Standard chat (Qwen)
python examples/chat/chat_cli.py

# With Triton backend
python examples/chat/chat_cli_triton.py

# MoE models (Qwen3)
python examples/chat/chat_cli_moe.py

# Thinking mode (Qwen3-8B-Thinking)
python examples/chat/chat_cli_thinking.py
```

### Demos

```bash
# Basic GPU operations
python examples/demo_gpu.py

# CUDA Graph for LLM inference
python examples/demo_cuda_graph.py

# End-to-end LLM demo
python examples/demo_llm_e2e.py

# Qwen3 model demo
python examples/demo_qwen3.py
```

### demo_optimized.py
Performance comparison showing zero-copy optimizations.
### Benchmarks

```bash
python examples/demo_optimized.py
# Matrix multiplication benchmark
python examples/benchmarks/benchmark_matmul.py

# CUDA Graph LLM benchmark
python examples/benchmarks/bench_cuda_graph_llm.py

# Compare with cuBLAS
python examples/benchmarks/benchmark_compare.py
```

### demo_v01.py
Simple v0.1 feature demonstration (CPU simulation fallback).
### Speech/Audio

```bash
python examples/demo_v01.py
# Text-to-speech (Kokoro)
python examples/tts.py

# Real-time speech-to-text (Whisper)
python examples/whisper_realtime_stt.py
```

## Building Native Module

```bash
cd native
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
```
# From project root using build script
./build.sh 86 # RTX 3090 Ti
./build.sh 120a # RTX 5090

Copy the built module to `src/pygpukit/`:
- Linux: `_pygpukit_native.cpython-3xx-x86_64-linux-gnu.so`
- Windows: `_pygpukit_native.cp3xx-win_amd64.pyd`
# Or manually with pip
pip install -e . -v
```
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading