-
Notifications
You must be signed in to change notification settings - Fork 235
Add torch.compile support for 2x faster inference #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
May also need these deps package updates And this added |
I tried triton and it will not run. I am in a windows 11 system. I tried with 2.10 and 3.0 found at https://huggingface.co/madbuda/triton-windows-builds. I do have a 5070ti which sometimes causes problems with pytorch and other installations. |
For Windows, make sure you are using a compatible version with your PyTorch. pip uninstall triton triton-windows -y
pip install "triton-windows>=3.2,<3.3"Version compatibility from triton-windows:
|
|
I've updated the warning and README with the recommended triton-windows version for Windows users |
|
Tried this out with AMD and WSL. It does reduce my memory usage significantly, and appears to speed it up to 11it/s on a 7900 XTX. |
Adds
--compileand--compile_modeflags to use torch.compile.This is a massive performance improvement (2x) to inference speed (16 it/s to 32 it/s) tested on RTX 4090, RTX PRO 6000 and A100 taking it from 1:1 real-time inference speed to 2x real-time.
This should auto-detect triton/inductor availability and fall back with a warning.
Windows users will need to install triton-windows separately to use this.