m96-chan · m96-chan · Dec 30, 2025 · Dec 30, 2025 · Dec 30, 2025 · Dec 30, 2025
diff --git a/examples/README.md b/examples/README.md
@@ -1,43 +1,87 @@
 # PyGPUkit Examples
 
+## Directory Structure
+
+```
+examples/
+├── benchmarks/           # Performance benchmarks
+├── chat/                 # Chat CLI applications
+├── demos/archived/       # Version-specific demos (historical)
+├── demo_*.py             # Current feature demos
+├── tts.py                # Text-to-speech example
+└── whisper_realtime_stt.py  # Speech-to-text example
+```
+
 ## Requirements
 
-- NVIDIA GPU with CUDA support
-- CUDA Toolkit 12.x
+- NVIDIA GPU with SM >= 80 (Ampere or newer)
+- CUDA Toolkit 12.x or 13.x
 - Built native module (`_pygpukit_native`)
 
-## Examples
+## Quick Start
 
-### demo_gpu.py
-Basic GPU operations demo using the native C++ backend directly.
+### Chat CLI
 
 ```bash
+# Standard chat (Qwen)
+python examples/chat/chat_cli.py
+
+# With Triton backend
+python examples/chat/chat_cli_triton.py
+
+# MoE models (Qwen3)
+python examples/chat/chat_cli_moe.py
+
+# Thinking mode (Qwen3-8B-Thinking)
+python examples/chat/chat_cli_thinking.py
+```
+
+### Demos
+
+```bash
+# Basic GPU operations
 python examples/demo_gpu.py
+
+# CUDA Graph for LLM inference
+python examples/demo_cuda_graph.py
+
+# End-to-end LLM demo
+python examples/demo_llm_e2e.py
+
+# Qwen3 model demo
+python examples/demo_qwen3.py
 ```
 
-### demo_optimized.py
-Performance comparison showing zero-copy optimizations.
+### Benchmarks
 
 ```bash
-python examples/demo_optimized.py
+# Matrix multiplication benchmark
+python examples/benchmarks/benchmark_matmul.py
+
+# CUDA Graph LLM benchmark
+python examples/benchmarks/bench_cuda_graph_llm.py
+
+# Compare with cuBLAS
+python examples/benchmarks/benchmark_compare.py
 ```
 
-### demo_v01.py
-Simple v0.1 feature demonstration (CPU simulation fallback).
+### Speech/Audio
 
 ```bash
-python examples/demo_v01.py
+# Text-to-speech (Kokoro)
+python examples/tts.py
+
+# Real-time speech-to-text (Whisper)
+python examples/whisper_realtime_stt.py
 ```
 
 ## Building Native Module
 
 ```bash
-cd native
-mkdir build && cd build
-cmake .. -DCMAKE_BUILD_TYPE=Release
-cmake --build . --config Release
-```
+# From project root using build script
+./build.sh 86      # RTX 3090 Ti
+./build.sh 120a    # RTX 5090
 
-Copy the built module to `src/pygpukit/`:
-- Linux: `_pygpukit_native.cpython-3xx-x86_64-linux-gnu.so`
-- Windows: `_pygpukit_native.cp3xx-win_amd64.pyd`
+# Or manually with pip
+pip install -e . -v
+```
diff --git a/examples/bench_cuda_graph_llm.py → examples/benchmarks/bench_cuda_graph_llm.py b/examples/bench_cuda_graph_llm.py → examples/benchmarks/bench_cuda_graph_llm.py
diff --git a/examples/benchmark_compare.py → examples/benchmarks/benchmark_compare.py b/examples/benchmark_compare.py → examples/benchmarks/benchmark_compare.py
diff --git a/examples/benchmark_large.py → examples/benchmarks/benchmark_large.py b/examples/benchmark_large.py → examples/benchmarks/benchmark_large.py
diff --git a/examples/benchmark_matmul.py → examples/benchmarks/benchmark_matmul.py b/examples/benchmark_matmul.py → examples/benchmarks/benchmark_matmul.py
diff --git a/examples/benchmark_tiled_matmul.py → ...ples/benchmarks/benchmark_tiled_matmul.py b/examples/benchmark_tiled_matmul.py → ...ples/benchmarks/benchmark_tiled_matmul.py
diff --git a/examples/chat_cli.py → examples/chat/chat_cli.py b/examples/chat_cli.py → examples/chat/chat_cli.py