This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.
-
Updated
Nov 2, 2025 - HTML
This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.
Profiling with NVIDIA Nsight Tools Bootcamp
References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)
University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.
This project demonstrates the integration of a CUDA kernel within an NVIDIA Holoscan application. It consists of two custom operators: one for memory allocation and data initialization, and another for executing the CUDA kernel. The application was profiled using Nsight systems and the kernel with Nsight compute
Roofline profiling for Deep Learning models
Add a description, image, and links to the nsight-compute topic page so that developers can more easily learn about it.
To associate your repository with the nsight-compute topic, visit your repo's landing page and select "manage topics."