-
Notifications
You must be signed in to change notification settings - Fork 443
feat: support applying LoRA at runtime #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
leejet
commented
Nov 11, 2025
|
Applying LoRA at runtime will slow down inference. This is a known issue, and I’ll optimize it later when time permits. |
|
Nice, also thank you for making it optional. 🚀 |
|
Nice! I guess the next step for optimization would be to do |
|
Yes, this should optimize the inference speed. |
How much more memory is expected? Testing on Vulkan with SDXL ( The resulting images are byte-identical to the ones with |
It sure does:
(edit: for sd1.5, it looks like it's ever so slightly slower, but it saves some memory) |
This also very much depends on the lora, they come in wildly different sizes too. |
|
With Qwen on Vulkan, I get an assertion failure at
|
@wbruna Could you paste the full command line? |
|
Currently, applying LoRA at runtime should not consume additional compute buffers — at least in most cases. immediately at_runtime |
full output(this was with 9a35003 , I'll test again with the most recent commit) EDIT: working now on 8850157 🙂 |