branch: https://github.com/rsxdalv/chatterbox/tree/faster
Running on: NVIDIA A100(80G) GPU
Python Enviornment 3.11.5, Ubuntu 22.04, Cuda 12.8.0
Package name | Version
accelerate | 1.10.1
aiofiles | 24.1.0
aiohappyeyeballs | 2.4.3
aiohttp | 3.10.8
aiosignal | 1.3.1
annotated-types | 0.7.0
antlr4-python3-runtime | 4.9.3
anyio | 4.11.0
attrs | 24.2.0
audiolab | 0.3.6
audioread | 3.0.1
auto-editor | 29.1.0
av | 15.1.0
brotli | 1.1.0
certifi | 2024.8.30
cffi | 2.0.0
cfgv | 3.4.0
charset-normalizer | 3.4.3
click | 8.3.0
coloredlogs | 15.0.1
conformer | 0.3.2
contourpy | 1.3.3
ctranslate2 | 4.6.0
cycler | 0.12.1
decorator | 5.2.1
diffusers | 0.35.1
distlib | 0.4.0
einops | 0.8.1
fastapi | 0.118.0
faster-whisper | 1.2.0
ffmpeg-python | 0.2.0
ffmpy | 0.6.1
filelock | 3.19.1
flatbuffers | 25.9.23
fonttools | 4.60.1
frozenlist | 1.4.1
fsspec | 2025.9.0
future | 1.0.0
gradio | 5.49.0
gradio-client | 1.13.3
groovy | 0.1.2
grpclib | 0.4.7
h11 | 0.16.0
h2 | 4.1.0
hf-transfer | 0.1.9
hf-xet | 1.1.10
hpack | 4.0.0
httpcore | 1.0.9
httpx | 0.28.1
huggingface-hub | 0.35.3
humanfriendly | 10.0
humanize | 4.13.0
hyperframe | 6.0.1
identify | 2.6.15
idna | 3.10
importlib-metadata | 8.7.0
inquirerpy | 0.3.4
jinja2 | 3.1.6
joblib | 1.5.2
kiwisolver | 1.4.9
lazy-loader | 0.4
librosa | 0.11.0
llvmlite | 0.45.1
markdown-it-py | 4.0.0
markupsafe | 3.0.3
matplotlib | 3.10.6
mdurl | 0.1.2
ml-dtypes | 0.5.3
more-itertools | 10.8.0
mpmath | 1.3.0
msgpack | 1.1.1
multidict | 6.1.0
networkx | 3.5
nltk | 3.9.2
nodeenv | 1.9.1
numba | 0.62.1
numpy | 2.3.3
nvidia-cublas-cu12 | 12.8.4.1
nvidia-cuda-cupti-cu12 | 12.8.90
nvidia-cuda-nvrtc-cu12 | 12.8.93
nvidia-cuda-runtime-cu12 | 12.8.90
nvidia-cudnn-cu12 | 9.10.2.21
nvidia-cufft-cu12 | 11.3.3.83
nvidia-cufile-cu12 | 1.13.1.3
nvidia-curand-cu12 | 10.3.9.90
nvidia-cusolver-cu12 | 11.7.3.90
nvidia-cusparse-cu12 | 12.5.8.93
nvidia-cusparselt-cu12 | 0.7.1
nvidia-nccl-cu12 | 2.27.3
nvidia-nvjitlink-cu12 | 12.8.93
nvidia-nvtx-cu12 | 12.8.90
omegaconf | 2.3.0
onnx | 1.19.0
onnxruntime | 1.23.0
openai-whisper | 20250625
orjson | 3.11.3
packaging | 25.0
pandas | 2.3.3
peft | 0.17.1
pfzy | 0.3.4
pillow | 11.3.0
pip | 25.2
platformdirs | 4.4.0
pooch | 1.8.2
pre-commit | 4.3.0
prompt-toolkit | 3.0.52
protobuf | 5.29.5
psutil | 5.9.8
pycparser | 2.23
pydantic | 2.11.10
pydantic-core | 2.33.2
pydub | 0.25.1
pygments | 2.19.2
pyparsing | 3.2.5
pyrnnoise | 0.3.8
python-dateutil | 2.9.0.post0
python-multipart | 0.0.20
pytz | 2025.2
pyyaml | 6.0.3
regex | 2025.9.18
requests | 2.32.5
resampy | 0.4.3
resemble-perth | 1.0.1
rich | 14.1.0
ruff | 0.13.3
s3tokenizer | 0.2.0
safehttpx | 0.1.6
safetensors | 0.6.2
scikit-learn | 1.7.2
scipy | 1.16.2
semantic-version | 2.10.0
sentencepiece | 0.2.1
setuptools | 68.1.2
shellingham | 1.5.4
silero-vad | 6.0.0
six | 1.17.0
smart-open | 7.3.1
sniffio | 1.3.1
soundfile | 0.13.1
soxr | 1.0.0
spaces | 0.42.1
starlette | 0.48.0
sympy | 1.14.0
threadpoolctl | 3.6.0
tiktoken | 0.11.0
tokenizers | 0.22.1
tomlkit | 0.13.3
torch | 2.8.0
torchaudio | 2.8.0
tqdm | 4.67.1
transformers | 4.57.0
triton | 3.4.0
typer | 0.19.2
typing-extensions | 4.12.2
typing-inspection | 0.4.2
tzdata | 2025.2
urllib3 | 2.5.0
uv | 0.8.23
uvicorn | 0.37.0
virtualenv | 20.34.0
wcwidth | 0.2.14
websockets | 15.0.1
wheel | 0.45.1
wrapt | 1.17.3
yarl | 1.13.1
zipp | 3.23.0
I called just
model.generate(
sentence,
language_id='en',
audio_prompt_path=audio_prompt_path_input
)
CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 109, in _capture_graph_for_bucket
static_tensors["out_1"], static_tensors["out_2"] = self.generate_token(
^^^^^^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3.py", line 616, in generate_t3_token
logits = repetition_penalty_processor(generated_ids, logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/generation/logits_process.py", line 405, in call
scores_processed = scores.scatter(1, input_ids, score)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/xtts/chatterbox/Chatter.py", line 854, in process_one_chunk_deterministic
wav = model.generate(
^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/tts.py", line 253, in generate
speech_tokens = self.t3.inference(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3.py", line 522, in inference
outputs = generate_token(
^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 156, in call
self._capture_graph_for_bucket(
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 108, in _capture_graph_for_bucket
with torch.cuda.graph(self._bucket_graphs[bucket_key]):
File "/usr/local/lib/python3.11/site-packages/torch/cuda/graphs.py", line 222, in exit
self.cuda_graph.capture_end()
File "/usr/local/lib/python3.11/site-packages/torch/cuda/graphs.py", line 104, in capture_end
super().capture_end()
torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I tried with fast branch on same environment. it works well.
Does anyone can help me?
Warm Regards.
branch: https://github.com/rsxdalv/chatterbox/tree/faster
Running on: NVIDIA A100(80G) GPU
Python Enviornment 3.11.5, Ubuntu 22.04, Cuda 12.8.0
Package name | Version
accelerate | 1.10.1
aiofiles | 24.1.0
aiohappyeyeballs | 2.4.3
aiohttp | 3.10.8
aiosignal | 1.3.1
annotated-types | 0.7.0
antlr4-python3-runtime | 4.9.3
anyio | 4.11.0
attrs | 24.2.0
audiolab | 0.3.6
audioread | 3.0.1
auto-editor | 29.1.0
av | 15.1.0
brotli | 1.1.0
certifi | 2024.8.30
cffi | 2.0.0
cfgv | 3.4.0
charset-normalizer | 3.4.3
click | 8.3.0
coloredlogs | 15.0.1
conformer | 0.3.2
contourpy | 1.3.3
ctranslate2 | 4.6.0
cycler | 0.12.1
decorator | 5.2.1
diffusers | 0.35.1
distlib | 0.4.0
einops | 0.8.1
fastapi | 0.118.0
faster-whisper | 1.2.0
ffmpeg-python | 0.2.0
ffmpy | 0.6.1
filelock | 3.19.1
flatbuffers | 25.9.23
fonttools | 4.60.1
frozenlist | 1.4.1
fsspec | 2025.9.0
future | 1.0.0
gradio | 5.49.0
gradio-client | 1.13.3
groovy | 0.1.2
grpclib | 0.4.7
h11 | 0.16.0
h2 | 4.1.0
hf-transfer | 0.1.9
hf-xet | 1.1.10
hpack | 4.0.0
httpcore | 1.0.9
httpx | 0.28.1
huggingface-hub | 0.35.3
humanfriendly | 10.0
humanize | 4.13.0
hyperframe | 6.0.1
identify | 2.6.15
idna | 3.10
importlib-metadata | 8.7.0
inquirerpy | 0.3.4
jinja2 | 3.1.6
joblib | 1.5.2
kiwisolver | 1.4.9
lazy-loader | 0.4
librosa | 0.11.0
llvmlite | 0.45.1
markdown-it-py | 4.0.0
markupsafe | 3.0.3
matplotlib | 3.10.6
mdurl | 0.1.2
ml-dtypes | 0.5.3
more-itertools | 10.8.0
mpmath | 1.3.0
msgpack | 1.1.1
multidict | 6.1.0
networkx | 3.5
nltk | 3.9.2
nodeenv | 1.9.1
numba | 0.62.1
numpy | 2.3.3
nvidia-cublas-cu12 | 12.8.4.1
nvidia-cuda-cupti-cu12 | 12.8.90
nvidia-cuda-nvrtc-cu12 | 12.8.93
nvidia-cuda-runtime-cu12 | 12.8.90
nvidia-cudnn-cu12 | 9.10.2.21
nvidia-cufft-cu12 | 11.3.3.83
nvidia-cufile-cu12 | 1.13.1.3
nvidia-curand-cu12 | 10.3.9.90
nvidia-cusolver-cu12 | 11.7.3.90
nvidia-cusparse-cu12 | 12.5.8.93
nvidia-cusparselt-cu12 | 0.7.1
nvidia-nccl-cu12 | 2.27.3
nvidia-nvjitlink-cu12 | 12.8.93
nvidia-nvtx-cu12 | 12.8.90
omegaconf | 2.3.0
onnx | 1.19.0
onnxruntime | 1.23.0
openai-whisper | 20250625
orjson | 3.11.3
packaging | 25.0
pandas | 2.3.3
peft | 0.17.1
pfzy | 0.3.4
pillow | 11.3.0
pip | 25.2
platformdirs | 4.4.0
pooch | 1.8.2
pre-commit | 4.3.0
prompt-toolkit | 3.0.52
protobuf | 5.29.5
psutil | 5.9.8
pycparser | 2.23
pydantic | 2.11.10
pydantic-core | 2.33.2
pydub | 0.25.1
pygments | 2.19.2
pyparsing | 3.2.5
pyrnnoise | 0.3.8
python-dateutil | 2.9.0.post0
python-multipart | 0.0.20
pytz | 2025.2
pyyaml | 6.0.3
regex | 2025.9.18
requests | 2.32.5
resampy | 0.4.3
resemble-perth | 1.0.1
rich | 14.1.0
ruff | 0.13.3
s3tokenizer | 0.2.0
safehttpx | 0.1.6
safetensors | 0.6.2
scikit-learn | 1.7.2
scipy | 1.16.2
semantic-version | 2.10.0
sentencepiece | 0.2.1
setuptools | 68.1.2
shellingham | 1.5.4
silero-vad | 6.0.0
six | 1.17.0
smart-open | 7.3.1
sniffio | 1.3.1
soundfile | 0.13.1
soxr | 1.0.0
spaces | 0.42.1
starlette | 0.48.0
sympy | 1.14.0
threadpoolctl | 3.6.0
tiktoken | 0.11.0
tokenizers | 0.22.1
tomlkit | 0.13.3
torch | 2.8.0
torchaudio | 2.8.0
tqdm | 4.67.1
transformers | 4.57.0
triton | 3.4.0
typer | 0.19.2
typing-extensions | 4.12.2
typing-inspection | 0.4.2
tzdata | 2025.2
urllib3 | 2.5.0
uv | 0.8.23
uvicorn | 0.37.0
virtualenv | 20.34.0
wcwidth | 0.2.14
websockets | 15.0.1
wheel | 0.45.1
wrapt | 1.17.3
yarl | 1.13.1
zipp | 3.23.0
I called just
model.generate(
sentence,
language_id='en',
audio_prompt_path=audio_prompt_path_input
)
CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.Traceback (most recent call last):
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 109, in _capture_graph_for_bucket
static_tensors["out_1"], static_tensors["out_2"] = self.generate_token(
^^^^^^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3.py", line 616, in generate_t3_token
logits = repetition_penalty_processor(generated_ids, logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/transformers/generation/logits_process.py", line 405, in call
scores_processed = scores.scatter(1, input_ids, score)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/xtts/chatterbox/Chatter.py", line 854, in process_one_chunk_deterministic
wav = model.generate(
^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/tts.py", line 253, in generate
speech_tokens = self.t3.inference(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3.py", line 522, in inference
outputs = generate_token(
^^^^^^^^^^^^^^^
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 156, in call
self._capture_graph_for_bucket(
File "/root/xtts/chatterbox/chatterbox/src/chatterbox/models/t3/t3_cuda_graphs.py", line 108, in _capture_graph_for_bucket
with torch.cuda.graph(self._bucket_graphs[bucket_key]):
File "/usr/local/lib/python3.11/site-packages/torch/cuda/graphs.py", line 222, in exit
self.cuda_graph.capture_end()
File "/usr/local/lib/python3.11/site-packages/torch/cuda/graphs.py", line 104, in capture_end
super().capture_end()
torch.AcceleratorError: CUDA error: operation failed due to a previous error during capture
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.I tried with fast branch on same environment. it works well.
Does anyone can help me?
Warm Regards.