Skip to content

Feature request: allow users to change retinanet candidate images. #1309

@bw4sz

Description

@bw4sz

Expose RetinaNet detections_per_img and topk_candidates to user config

Problem

The torchvision RetinaNet model has two important parameters that control how many detections are returned per image:

  • detections_per_img (default: 300) — the maximum number of detections returned after NMS
  • topk_candidates (default: 1000) — the number of top-scoring candidates considered before NMS

Neither of these is currently exposed in the DeepForest config. They are hardcoded at the torchvision defaults. For dense scenes — such as large bird colonies or very dense tree canopies — 300 detections per image may be insufficient, leading to missed objects that the model would otherwise detect.

Example

While running predictions on dense bird imagery (file C2_L2_F52_T20230910_134201_350.jpg from flight JPG_20230910_133900), we appear to be hitting the detection cap in very dense regions. The model scores objects above the score_thresh but they get silently dropped because they exceed detections_per_img.

We also see this in the Bird Fine-Tuning example notebook, which already warns:

UserWarning: Encountered more than 100 detections in a single image. This means that certain detections with the lowest scores will be ignored...

Where the limit is applied

In src/deepforest/models/retinanet.py, RetinaNetHub.__init__ passes **kwargs through to torchvision.models.detection.retinanet.RetinaNet, which accepts detections_per_img and topk_candidates. However, Model.create_model() never passes these parameters, so the torchvision defaults are always used:

# current code in create_model — no detections_per_img or topk_candidates
model = RetinaNetHub(
    backbone_weights="COCO_V1",
    num_classes=self.config.num_classes,
    nms_thresh=self.config.nms_thresh,
    score_thresh=self.config.score_thresh,
    label_dict=label_dict,
)

Proposed solution

  1. Add detections_per_img (default 300) and topk_candidates (default 1000) to the DeepForest config schema and config.yaml, so users can override them.
  2. Pass them through from Model.create_model()RetinaNetHub()RetinaNet.__init__().

Discussion: Can we just increase the default?

Since we already have score_thresh filtering low-confidence detections, one might ask: why not just set detections_per_img very high and rely on score_thresh alone?

Things to consider:

  • Performance: Higher detections_per_img means more boxes survive to NMS, which increases compute cost — though this is likely modest for most use cases.
  • Memory: More detections means larger output tensors, which could matter on GPU-constrained setups or with very large batch sizes.
  • Interaction with topk_candidates: topk_candidates acts as a pre-NMS filter. If detections_per_img is raised but topk_candidates is left at 1000, the improvement may plateau. Both should be tunable together.
  • Backwards compatibility: Changing the default could alter existing users' results. Exposing the parameter without changing the default preserves backwards compatibility while giving power users control.

The safest first step is to expose both parameters in the config with their current torchvision defaults.

Questions

  1. @jveitchmichaelis does DETR have similar limitations?
  2. What are the downsides, this threshold must exist for a reason, is it slower?

Metadata

Metadata

Labels

APIThis tag is used for small improvements to the readability and usability of the python API.High Priority

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions