Skip to content

A from-scratch C++ implementation of reverse-mode automatic differentiation with a dynamic computation tape.

License

Notifications You must be signed in to change notification settings

Jam-Cai/torchlet

Repository files navigation

torchlet

A from-scratch C++ deep learning library implementing reverse-mode automatic differentiation with a dynamic computation graph. Inspired by micrograd and PyTorch.

Motivation

Built to understand the internals of deep learning frameworks by implementing everything from first principles:

  • No dependencies — Pure C++17, header-only library
  • Educational — Clear, readable implementations over heavy optimization
  • Complete — From scalar autograd to CNNs, RNNs, and optimizers

Features

Core Engine

  • Automatic Differentiation — Reverse-mode autodiff with dynamic tape
  • Tensor Operations — N-dimensional arrays with broadcasting
  • Computation Graph — Visualization in DOT, ASCII, and Mermaid formats

Neural Network Layers

Layer Description
Linear Fully connected layer
Conv2D 2D convolution with padding/stride
MaxPool2D / AvgPool2D Pooling layers
Dropout Regularization
BatchNorm1d Batch normalization
RNNCell / LSTMCell / GRUCell Recurrent layers

Activations

ReLU · LeakyReLU · GELU · Swish · Tanh · Sigmoid · Softplus

Optimizers

SGD (with momentum) · Adam · RMSprop · AdaGrad

Loss Functions

MSE · MAE · Huber · CrossEntropy · BCE · Hinge · KLDiv

Training Utilities

  • LR Schedulers — Step, Cosine, Warmup, Exponential, ReduceOnPlateau
  • Gradient Clipping — By value or norm
  • Weight Init — Xavier, He/Kaiming (uniform & normal)
  • Serialization — Save/load models to binary files

Sequence Utilities

  • Padding and packing for variable-length sequences
  • Attention masks
  • One-hot encoding and embedding lookup
  • Sliding windows

Performance

  • SIMD — SSE/AVX vectorization for tensor math
  • Thread Pool — Async operations and parallel for loops
  • OpenMP — Parallel matrix multiply, convolutions, reductions
  • Memory Pool — Cache-aligned allocations, arena allocator

Quick Start

#include "tensor.hpp"
#include "nn.hpp"
#include "optimizer.hpp"

// Create a simple network
nn::Sequential model;
model.add(std::make_shared<nn::Linear>(784, 128));
model.add(std::make_shared<nn::ActivationLayer>(nn::Activation::ReLU));
model.add(std::make_shared<nn::Linear>(128, 10));

// Forward pass
auto x = randn({784, 32});  // [features, batch]
auto y = model.forward(x);

// Compute loss and backprop
auto loss = mse(y, target);
loss->backward();

// Update weights
SGD optimizer(model.parameters(), 0.01);
optimizer.step();
optimizer.zeroGrad();

Build

# Header-only, just include
clang++ -std=c++17 -O2 -I include your_code.cpp -o your_program

# With OpenMP (optional)
clang++ -std=c++17 -O2 -Xpreprocessor -fopenmp -I include your_code.cpp -lomp

# With threading
clang++ -std=c++17 -O2 -pthread -I include your_code.cpp

Project Structure

include/
├── value.hpp        # Scalar autograd engine
├── tensor.hpp       # N-dimensional tensor with autograd
├── nn.hpp           # Neural network modules
├── conv.hpp         # CNN layers (Conv2D, pooling)
├── rnn.hpp          # RNN cells (LSTM, GRU)
├── optimizer.hpp    # SGD, Adam, schedulers
├── loss.hpp         # Loss functions
├── serialize.hpp    # Model save/load
├── visualize.hpp    # Graph visualization
├── sequence.hpp     # Sequence utilities
├── simd.hpp         # SIMD operations
├── threadpool.hpp   # Thread pool
├── parallel.hpp     # OpenMP parallelization
├── mempool.hpp      # Memory management
└── benchmark.hpp    # Profiling tools

src/
├── tensor_demo.cpp
├── nn_demo.cpp
├── conv_demo.cpp
├── rnn_demo.cpp
├── optimizer_demo.cpp
└── ...

Benchmarks

On Apple M2 (single-threaded):

Operation Performance
SIMD matmul 128×128 14 GFLOPS, 489× vs naive
SIMD elementwise (100K) 14× vs naive
Parallel sum (1M, 12 threads) 4× vs sequential
Arena allocator 70× vs malloc

License

MIT

About

A from-scratch C++ implementation of reverse-mode automatic differentiation with a dynamic computation tape.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published