Skip to content

Commit 5db8dc7

Browse files
committed
Update blog
1 parent cfe2582 commit 5db8dc7

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

content/blog/2025-08-25-1756113601.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ class TinyCNN(nn.Module):
3636

3737
I ran this on a NVIDIA 4060 8 GB (Laptop) for 10K iterations, on Windows and WSL-with-Ubuntu, with float32 data.
3838

39-
I ported this model to plain torch, torch.compile, TensorRT, TensorRT RTX, plain CUDA (fused operation), plain Vulkan (fused operation), ggml + CUDA, and ggml + Vulkan.
39+
I ported this model to plain torch, torch.compile, TensorRT, TensorRT RTX, plain CUDA (fused operation), plain Vulkan (fused operation), ggml + CUDA, ggml + Vulkan, and ONNX Runtime + CUDA.
4040

4141
I've included the performance numbers below, but they shouldn't be taken very seriously since the model is too small to paint a true picture (in terms of computation complexity and data size). The intent is to verify that the different test setups are working somewhat sanely.
4242

@@ -46,6 +46,7 @@ For 10k iterations:
4646
| 1.6s | plain torch | Ubuntu Linux (WSL) |
4747
| 1.6s | TensorRT | Windows |
4848
| 1.6s | fused CUDA kernel | Windows |
49+
| 1.6s | ONNX Runtime with CUDA | Windows |
4950
| 1.7s | TensorRT RTX | Windows |
5051
| 1.9s | plain torch | Windows |
5152
| 2.3s | ggml + CUDA | Windows |

0 commit comments

Comments
 (0)