Skip to content

Conversation

@devdaniel
Copy link

@devdaniel devdaniel commented Jan 24, 2026

Adds --compile and --compile_mode flags to use torch.compile.

This is a massive performance improvement (2x) to inference speed (16 it/s to 32 it/s) tested on RTX 4090, RTX PRO 6000 and A100 taking it from 1:1 real-time inference speed to 2x real-time.
This should auto-detect triton/inductor availability and fall back with a warning.
Windows users will need to install triton-windows separately to use this.

@devdaniel devdaniel marked this pull request as ready for review January 24, 2026 08:26
@devdaniel
Copy link
Author

May also need these deps package updates

    "numpy>=2.0.2",
    "torch>=2.10.0",
    "torchaudio>=2.10.0",
    "torchtune>=0.6.1",
    "torchao>=0.15.0",
    "torchvision>=0.25.0",

And this added

    "bitsandbytes>=0.49.0",

@hectic-droid
Copy link

Adds --compile and --compile_mode flags to use torch.compile.

This is a massive performance improvement (2x) to inference speed (16 it/s to 32 it/s) tested on RTX 4090, RTX PRO 6000 and A100 taking it from 1:1 real-time inference speed to 2x real-time. This should auto-detect triton/inductor availability and fall back with a warning. Windows users will need to install triton-windows separately to use this.

I tried triton and it will not run. I am in a windows 11 system. I tried with 2.10 and 3.0 found at https://huggingface.co/madbuda/triton-windows-builds. I do have a 5070ti which sometimes causes problems with pytorch and other installations.

@devdaniel
Copy link
Author

Adds --compile and --compile_mode flags to use torch.compile.
This is a massive performance improvement (2x) to inference speed (16 it/s to 32 it/s) tested on RTX 4090, RTX PRO 6000 and A100 taking it from 1:1 real-time inference speed to 2x real-time. This should auto-detect triton/inductor availability and fall back with a warning. Windows users will need to install triton-windows separately to use this.

I tried triton and it will not run. I am in a windows 11 system. I tried with 2.10 and 3.0 found at https://huggingface.co/madbuda/triton-windows-builds. I do have a 5070ti which sometimes causes problems with pytorch and other installations.

For Windows, make sure you are using a compatible version with your PyTorch.

pip uninstall triton triton-windows -y
pip install "triton-windows>=3.2,<3.3"

Version compatibility from triton-windows:

PyTorch Triton
2.6 3.2
2.7 3.3
2.8 3.4
2.9 3.5

@devdaniel
Copy link
Author

I've updated the warning and README with the recommended triton-windows version for Windows users

@jdluzen
Copy link

jdluzen commented Jan 25, 2026

Tried this out with AMD and WSL. It does reduce my memory usage significantly, and appears to speed it up to 11it/s on a 7900 XTX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants