fix: offloading model layers to gpu and using flash_attn for gpu by HeisenberG2575 · Pull Request #75 · neuphonic/neutts

HeisenberG2575 · 2025-11-19T18:07:31Z

Context:

The _load_backbone functions loads the neutts-air backbone for inference. While this function works as intended for CPU inference, GPU inference is slowed down due to faults in code

Problem:

Considering that backbone_device is used to export the model to the appropriate device, "gpu" isn't a valid string for .to() hence users resort to the standard "cuda" or strings having "cuda" such as "cuda:0" as a prefix. The current if conditions

n_gpu_layers=-1 if backbone_device == "gpu" else 0
and
flash_attn=True if backbone_device == "gpu" else False
do not have the intended effect of offloading the model to GPU and using flash-attention since the string is checked for "gpu"

Fix:

check string for "cuda" and "gpu" using python's inbuilt startswith method for strings ensures usage of "cuda" as backbone_device leads to intended behaviour
checking for "gpu" alongside "cuda" ensures the change is not code-breaking for existing users

TODOs:

include other device strings that may be utilized for GPU inference and support usage of flash_attn and use of n_gpu_layers with gguf

fix: offloading model layers to gpu and using flash_attn for gpu

c6aee23

HeisenberG2575 force-pushed the main branch from 623a28d to c6aee23 Compare November 19, 2025 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: offloading model layers to gpu and using flash_attn for gpu#75

fix: offloading model layers to gpu and using flash_attn for gpu#75
HeisenberG2575 wants to merge 1 commit intoneuphonic:mainfrom
HeisenberG2575:main

HeisenberG2575 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HeisenberG2575 commented Nov 19, 2025

Context:

Problem:

Fix:

TODOs:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant