Here are the numbers i been getting :
`
padding: torch.Size([2, 115, 1024])
Sampling: 13%|███▉ | 132/1000 [00:01<00:10, 86.10it/s]Capturing CUDA graph for bucket 500 (max_position: 500)
Sampling: 18%|█████▍ | 180/1000 [00:02<00:10, 75.00it/s]
S3Gen inference time: 0.69 seconds
[English TTS]
Generation time: 3.154 s
Audio duration: 6.400 s
RTF: 0.493
Estimated token count: 172
Input embeds shape before padding: torch.Size([2, 122, 1024])
Sampling: 12%|███▌ | 120/1000 [00:01<00:09, 88.38it/s]Capturing CUDA graph for bucket 500 (max_position: 500)
Sampling: 18%|█████▍ | 180/1000 [00:02<00:11, 73.90it/s]
[Multilingual TTS - FR]
Generation time: 3.184 s
Audio duration: 5.800 s
RTF: 0.549
`
is this right ? what is the rtf this branch should provide ?
Here are the numbers i been getting :
`
padding: torch.Size([2, 115, 1024])
Sampling: 13%|███▉ | 132/1000 [00:01<00:10, 86.10it/s]Capturing CUDA graph for bucket 500 (max_position: 500)
Sampling: 18%|█████▍ | 180/1000 [00:02<00:10, 75.00it/s]
S3Gen inference time: 0.69 seconds
[English TTS]
Generation time: 3.154 s
Audio duration: 6.400 s
RTF: 0.493
Estimated token count: 172
Input embeds shape before padding: torch.Size([2, 122, 1024])
Sampling: 12%|███▌ | 120/1000 [00:01<00:09, 88.38it/s]Capturing CUDA graph for bucket 500 (max_position: 500)
Sampling: 18%|█████▍ | 180/1000 [00:02<00:11, 73.90it/s]
[Multilingual TTS - FR]
Generation time: 3.184 s
Audio duration: 5.800 s
RTF: 0.549
`
is this right ? what is the rtf this branch should provide ?