A GPU-accelerated implementation of a neural network for the MNIST digit classification task using CUDA and C/C++.
This project demonstrates how to build a simple neural network for digit recognition (MNIST dataset) with GPU acceleration using CUDA. It includes custom CUDA kernels, data loading in C/C++/Python, and training / inference logic.
- Custom CUDA kernels for forward and backward propagation
- Integration of host-side code (C/C++ / Python) with GPU execution
- Simple MLP (multi-layer perceptron) architecture for MNIST
- Demonstrates low-level GPU programming concepts (memory management, kernel launches, synchronization)
- POC for more advanced projects in the future.