Skip to content

Depthwise Conv2d performance degrades at non-mod-16 spatial dimensions #3324

@thechriswebb

Description

@thechriswebb

Depthwise Conv2d throughput degrades significantly when spatial dimensions are not divisible by 16. This surfaces in multi-stage vision encoders where repeated 2x downsampling produces non-mod-16 intermediate feature maps.

Repro

repro_depthwise_mod16.py

Results

Channels Mod-16 res Other res Mod-16 ms Other ms Gap Other mod 16
96 256x256 240x240 0.52 0.50 -5% 0
192 128x128 120x120 0.47 0.45 -5% 8
384 64x64 60x60 0.43 0.83 +94% 12
768 32x32 30x30 0.37 0.55 +47% 14
1536 16x16 15x15 0.37 0.47 +27% 15

When the non-mod-16 dimension is 240 (still divisible by 16), there is no penalty. The gap appears at 120 and below, where the remainder is nonzero, and compounds across stages to produce ~53% end-to-end throughput loss through a 5-stage encoder.

Expected behavior

Depthwise Conv2d throughput should scale proportionally with pixel count regardless of whether spatial dimensions are divisible by 16.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions