Skip to content

lolzio5/autofuser

Repository files navigation

AutoFusion: Automated Kernel Fusion and Tuning Framework

autofuse The idea is to automatically detect fusion opportunities, generate Triton kernels for each operation, fuse and auto-tune them for deep learning models to reduce memory traffic and improve GPU performance, similar to torch.compile.

This is done by the following steps:

  1. Extract the data dependency graph (via MaseGraph)
  2. Identify opportunities for kernel fusion
  3. Choose a tiling strategy for each fusion candidate
  4. Automatically generate the fused kernel Triton code
  5. Autotune the generated kernel configuration using the tiling strategy
  6. Reinsert the fused kernels, replacing the original graph nodes

We have targeted optimizations in Scientific Machine Learning models, obtained from the Neural-Solver-Library. More information on the decision to work with these models can be found here

Documentation and Testing

autofuse

We have tried our best to make each stage modular, where each module contains its own documentation, unit tests and benchmarking scripts.

The repo is structured as follows:

  • autofuser :: contains the code for the AutoFusion pipeline. The main entry point is autofuse.py, wich treats the other modules as packages:

    1. neuralset :: Neural-Solver-Library model, data loading, and training utils.
    2. graph :: utils for graph preparation, finding fusion chains and graph rewriting
    3. tiling :: selects backend-specific tiling strategies and autotune search spaces
    4. fuser :: generates the fused Triton code from a fusion spec and tiling strategy
    5. autotune :: triton kernel autotuning for a given tiling strategy
  • experiments :: contains scripts to automatically run the fusion pipelilne for each model, and a given dataset. The model configuration is the reported best configuration from each paper.

As mentioned, each module has its own unit test suite, but additionally autofuse/test/ contains tests for the general pipeline.

Unit tests use pytest which can be run with pytest PATH/TO/TEST/FOLDER/

Setup

All dependencies can be installed via setup_env.sh.

If running on Google Colab, please open this notebook, which contains scripts to clone the repository, setup the environment and load all the data.

Data was obtained from the PDEBench [NeurIPS 2022 Track Datasets and Benchmarks] for benchmarking autoregressive tasks. We have tested our models, and developped scripts for the following data:

  • airfoil, install and move this data to data/airfoil
  • Navier-Stokes, install NavierStokes_V1e-5_N1200_T20 and move this data to data/ns

for testing autofusion for a given model on a given dataset, run

# For example, for FNO model on airfoil dataset
bash experiments/scripts/airfoil/FNO.sh

and to test all models for on a given dataset, run

# For example, run all models on airfoil dataset
bash experiments/scripts/run_all_experiments.sh scripts/airfoil

About

AutoFuser: Automatic Triton Kernel Fusion for Advanced Deep Learning Systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors