-
Notifications
You must be signed in to change notification settings - Fork 250
Description
Expose RetinaNet detections_per_img and topk_candidates to user config
Problem
The torchvision RetinaNet model has two important parameters that control how many detections are returned per image:
detections_per_img(default:300) — the maximum number of detections returned after NMStopk_candidates(default:1000) — the number of top-scoring candidates considered before NMS
Neither of these is currently exposed in the DeepForest config. They are hardcoded at the torchvision defaults. For dense scenes — such as large bird colonies or very dense tree canopies — 300 detections per image may be insufficient, leading to missed objects that the model would otherwise detect.
Example
While running predictions on dense bird imagery (file C2_L2_F52_T20230910_134201_350.jpg from flight JPG_20230910_133900), we appear to be hitting the detection cap in very dense regions. The model scores objects above the score_thresh but they get silently dropped because they exceed detections_per_img.
We also see this in the Bird Fine-Tuning example notebook, which already warns:
UserWarning: Encountered more than 100 detections in a single image. This means that certain detections with the lowest scores will be ignored...
Where the limit is applied
In src/deepforest/models/retinanet.py, RetinaNetHub.__init__ passes **kwargs through to torchvision.models.detection.retinanet.RetinaNet, which accepts detections_per_img and topk_candidates. However, Model.create_model() never passes these parameters, so the torchvision defaults are always used:
# current code in create_model — no detections_per_img or topk_candidates
model = RetinaNetHub(
backbone_weights="COCO_V1",
num_classes=self.config.num_classes,
nms_thresh=self.config.nms_thresh,
score_thresh=self.config.score_thresh,
label_dict=label_dict,
)Proposed solution
- Add
detections_per_img(default300) andtopk_candidates(default1000) to the DeepForest config schema andconfig.yaml, so users can override them. - Pass them through from
Model.create_model()→RetinaNetHub()→RetinaNet.__init__().
Discussion: Can we just increase the default?
Since we already have score_thresh filtering low-confidence detections, one might ask: why not just set detections_per_img very high and rely on score_thresh alone?
Things to consider:
- Performance: Higher
detections_per_imgmeans more boxes survive to NMS, which increases compute cost — though this is likely modest for most use cases. - Memory: More detections means larger output tensors, which could matter on GPU-constrained setups or with very large batch sizes.
- Interaction with
topk_candidates:topk_candidatesacts as a pre-NMS filter. Ifdetections_per_imgis raised buttopk_candidatesis left at 1000, the improvement may plateau. Both should be tunable together. - Backwards compatibility: Changing the default could alter existing users' results. Exposing the parameter without changing the default preserves backwards compatibility while giving power users control.
The safest first step is to expose both parameters in the config with their current torchvision defaults.
Questions
- @jveitchmichaelis does DETR have similar limitations?
- What are the downsides, this threshold must exist for a reason, is it slower?