AutoFusion: Automated Kernel Fusion and Tuning Framework

The idea is to automatically detect fusion opportunities, generate Triton kernels for each operation, fuse and auto-tune them for deep learning models to reduce memory traffic and improve GPU performance, similar to torch.compile.

This is done by the following steps:

Extract the data dependency graph (via MaseGraph)
Identify opportunities for kernel fusion
Choose a tiling strategy for each fusion candidate
Automatically generate the fused kernel Triton code
Autotune the generated kernel configuration using the tiling strategy
Reinsert the fused kernels, replacing the original graph nodes

We have targeted optimizations in Scientific Machine Learning models, obtained from the Neural-Solver-Library. More information on the decision to work with these models can be found here

Documentation and Testing

We have tried our best to make each stage modular, where each module contains its own documentation, unit tests and benchmarking scripts.

The repo is structured as follows:

autofuser :: contains the code for the AutoFusion pipeline. The main entry point is autofuse.py, wich treats the other modules as packages:
1. neuralset :: Neural-Solver-Library model, data loading, and training utils.
2. graph :: utils for graph preparation, finding fusion chains and graph rewriting
3. tiling :: selects backend-specific tiling strategies and autotune search spaces
4. fuser :: generates the fused Triton code from a fusion spec and tiling strategy
5. autotune :: triton kernel autotuning for a given tiling strategy
experiments :: contains scripts to automatically run the fusion pipelilne for each model, and a given dataset. The model configuration is the reported best configuration from each paper.

As mentioned, each module has its own unit test suite, but additionally autofuse/test/ contains tests for the general pipeline.

Unit tests use pytest which can be run with pytest PATH/TO/TEST/FOLDER/

Setup

All dependencies can be installed via setup_env.sh.

If running on Google Colab, please open this notebook, which contains scripts to clone the repository, setup the environment and load all the data.

Data was obtained from the PDEBench [NeurIPS 2022 Track Datasets and Benchmarks] for benchmarking autoregressive tasks. We have tested our models, and developped scripts for the following data:

airfoil, install and move this data to data/airfoil
Navier-Stokes, install NavierStokes_V1e-5_N1200_T20 and move this data to data/ns

for testing autofusion for a given model on a given dataset, run

# For example, for FNO model on airfoil dataset
bash experiments/scripts/airfoil/FNO.sh

and to test all models for on a given dataset, run

# For example, run all models on airfoil dataset
bash experiments/scripts/run_all_experiments.sh scripts/airfoil

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
autofuser		autofuser
experiments		experiments
figs		figs
mase		mase
.autofuse_tune_cache.json		.autofuse_tune_cache.json
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
colab.ipynb		colab.ipynb
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoFusion: Automated Kernel Fusion and Tuning Framework

Documentation and Testing

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoFusion: Automated Kernel Fusion and Tuning Framework

Documentation and Testing

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages