You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
Hi, authors! Thanks for your great work! But I have a question about the FPS at a large batch size. We have tested the latency at batchsize=1 on high-end GPUs, whose result is aligned with the reported speedup in Table 1. However, when we increase the batch size to 32 (or smaller, like 4, 16) as Table 1 does, the latency by dense or sparse inference is larger than cuDNN, which is against the reported results in Table 1. And the memory overhead is much larger than cuDNN. The experiment is conducted with YOLOv5s on MOT16, tested on a Tesla V100 GPU. The input size is set as (1088, 608), and we also tested the input size of (640, 640), whose result is similar. I'd appreciate it greatly if you could give some explanations!