torchlet

A from-scratch C++ deep learning library implementing reverse-mode automatic differentiation with a dynamic computation graph. Inspired by micrograd and PyTorch.

Motivation

Built to understand the internals of deep learning frameworks by implementing everything from first principles:

No dependencies — Pure C++17, header-only library
Educational — Clear, readable implementations over heavy optimization
Complete — From scalar autograd to CNNs, RNNs, and optimizers

Features

Core Engine

Automatic Differentiation — Reverse-mode autodiff with dynamic tape
Tensor Operations — N-dimensional arrays with broadcasting
Computation Graph — Visualization in DOT, ASCII, and Mermaid formats

Neural Network Layers

Layer	Description
`Linear`	Fully connected layer
`Conv2D`	2D convolution with padding/stride
`MaxPool2D` / `AvgPool2D`	Pooling layers
`Dropout`	Regularization
`BatchNorm1d`	Batch normalization
`RNNCell` / `LSTMCell` / `GRUCell`	Recurrent layers

Activations

ReLU · LeakyReLU · GELU · Swish · Tanh · Sigmoid · Softplus

Optimizers

SGD (with momentum) · Adam · RMSprop · AdaGrad

Loss Functions

MSE · MAE · Huber · CrossEntropy · BCE · Hinge · KLDiv

Training Utilities

LR Schedulers — Step, Cosine, Warmup, Exponential, ReduceOnPlateau
Gradient Clipping — By value or norm
Weight Init — Xavier, He/Kaiming (uniform & normal)
Serialization — Save/load models to binary files

Sequence Utilities

Padding and packing for variable-length sequences
Attention masks
One-hot encoding and embedding lookup
Sliding windows

Performance

SIMD — SSE/AVX vectorization for tensor math
Thread Pool — Async operations and parallel for loops
OpenMP — Parallel matrix multiply, convolutions, reductions
Memory Pool — Cache-aligned allocations, arena allocator

Quick Start

#include "tensor.hpp"
#include "nn.hpp"
#include "optimizer.hpp"

// Create a simple network
nn::Sequential model;
model.add(std::make_shared<nn::Linear>(784, 128));
model.add(std::make_shared<nn::ActivationLayer>(nn::Activation::ReLU));
model.add(std::make_shared<nn::Linear>(128, 10));

// Forward pass
auto x = randn({784, 32});  // [features, batch]
auto y = model.forward(x);

// Compute loss and backprop
auto loss = mse(y, target);
loss->backward();

// Update weights
SGD optimizer(model.parameters(), 0.01);
optimizer.step();
optimizer.zeroGrad();

Build

# Header-only, just include
clang++ -std=c++17 -O2 -I include your_code.cpp -o your_program

# With OpenMP (optional)
clang++ -std=c++17 -O2 -Xpreprocessor -fopenmp -I include your_code.cpp -lomp

# With threading
clang++ -std=c++17 -O2 -pthread -I include your_code.cpp

Project Structure

include/
├── value.hpp        # Scalar autograd engine
├── tensor.hpp       # N-dimensional tensor with autograd
├── nn.hpp           # Neural network modules
├── conv.hpp         # CNN layers (Conv2D, pooling)
├── rnn.hpp          # RNN cells (LSTM, GRU)
├── optimizer.hpp    # SGD, Adam, schedulers
├── loss.hpp         # Loss functions
├── serialize.hpp    # Model save/load
├── visualize.hpp    # Graph visualization
├── sequence.hpp     # Sequence utilities
├── simd.hpp         # SIMD operations
├── threadpool.hpp   # Thread pool
├── parallel.hpp     # OpenMP parallelization
├── mempool.hpp      # Memory management
└── benchmark.hpp    # Profiling tools

src/
├── tensor_demo.cpp
├── nn_demo.cpp
├── conv_demo.cpp
├── rnn_demo.cpp
├── optimizer_demo.cpp
└── ...

Benchmarks

On Apple M2 (single-threaded):

Operation	Performance
SIMD matmul 128×128	14 GFLOPS, 489× vs naive
SIMD elementwise (100K)	14× vs naive
Parallel sum (1M, 12 threads)	4× vs sequential
Arena allocator	70× vs malloc

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
include		include
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
benchmark_demo		benchmark_demo
conv_demo		conv_demo
loss_viz_demo		loss_viz_demo
mempool_demo		mempool_demo
nn_demo		nn_demo
optimizer_demo		optimizer_demo
parallel_demo		parallel_demo
rnn_demo		rnn_demo
sequence_demo		sequence_demo
serialize_demo		serialize_demo
simd_demo		simd_demo
tensor_demo		tensor_demo
threadpool_demo		threadpool_demo
training_utils_demo		training_utils_demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

torchlet

Motivation

Features

Core Engine

Neural Network Layers

Activations

Optimizers

Loss Functions

Training Utilities

Sequence Utilities

Performance

Quick Start

Build

Project Structure

Benchmarks

License

About

Uh oh!

Releases

Packages

Languages

License

Jam-Cai/torchlet

Folders and files

Latest commit

History

Repository files navigation

torchlet

Motivation

Features

Core Engine

Neural Network Layers

Activations

Optimizers

Loss Functions

Training Utilities

Sequence Utilities

Performance

Quick Start

Build

Project Structure

Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages