Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Lower speed after using large batch size #14

@HankYe

Description

@HankYe

Hi, authors! Thanks for your great work! But I have a question about the FPS at a large batch size. We have tested the latency at batchsize=1 on high-end GPUs, whose result is aligned with the reported speedup in Table 1. However, when we increase the batch size to 32 (or smaller, like 4, 16) as Table 1 does, the latency by dense or sparse inference is larger than cuDNN, which is against the reported results in Table 1. And the memory overhead is much larger than cuDNN. The experiment is conducted with YOLOv5s on MOT16, tested on a Tesla V100 GPU. The input size is set as (1088, 608), and we also tested the input size of (640, 640), whose result is similar. I'd appreciate it greatly if you could give some explanations!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions