Replies: 2 comments 3 replies
-
|
TL;DR: this is a bit more complex than setting/fixing training size. The analysis conflates two things: one is the requirement that models need to have identical tensor shapes for the forward pass (which already happens under the hood), the more subtle issue is how models handle differing image scales at train/test time (which we should care about, a lot). Batched training and validation are identical: they use the same dataset class, aside from validation having augmentations turned off. In both cases the models cannot deal with variable size images, they have a preprocessing step to fix it. During training, Next question is what should the size be? I have opinions...
RetinaNet just hides the details, but it should be doing that batching internally. Both models do the same thing: anything between 800-1333 is untouched and anything resized should have preserved aspect ratio. Note that So for a default of 800px, the model doesn't do anything, but we're assuming that it's appropriate to resize the image to 800px without the scale being off. I came up against this recently. I trained a model on LIDAR with 1024px images. When I tried to transfer to NeonTreeEval, it was predicting garbage small trees everywhere until I realised that it was scaling up the 400px images to 800px internally.
For semantic segmentation/pixel regression models, turning off auto scaling lets you do stuff like train on 1024px and predict on 2048 or as large as your GPU will allow in one go. This is theoretically true for convolutional models at least. The main benefit is less stitching required. I'm less sure what the behaviour is for RetinaNet/DETR if you input a huge image, but remember these models are benchmarked on datasets like COCO where 100 objects is a lot. Recommendations?
If possible, avoid having to manually figure out what the prediction size should be. |
Beta Was this translation helpful? Give feedback.
-
|
There are two factors that continue to bother me.
|
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Understanding Variable Image Sizes in DeepForest: Training vs Evaluation
sizeconfig neededvalidation.sizeto resize images to a common sizeThe Question
The Answer
Training with Variable-Sized Images
Training in DeepForest automatically handles images of different dimensions without requiring a
sizeconfiguration parameter. This works because:1. List-Based Batching
The training dataset's
collate_fnreturns images as a list of tensors (not a single stacked tensor):Each image can have different dimensions:
[(3, 400, 600), (3, 500, 700), (3, 450, 650)]2. Model Support
The underlying detection models (RetinaNet, DeformableDETR) are designed to accept lists of variable-sized images:
Modern detection models from torchvision (like RetinaNet) and transformers (like DeformableDETR) natively support this pattern. The model processes each image independently and computes per-image losses.
Evaluation/Prediction with Same-Sized Images
Evaluation and prediction require all images to have the same dimensions. This is enforced by:
1. Tensor-Based Batching
The prediction dataset's
collate_fnusesdefault_collatewhich attempts to stack images into a single tensor:This requires all images to have identical dimensions to create a proper batch tensor.
2. Why the Difference?
The different batching strategies exist for good reasons:
Training (List-based):
Evaluation/Prediction (Tensor-based):
Configuration
For Training
No
train.sizeparameter exists in the config:For Evaluation/Prediction
The
validation.sizeparameter must be set:Then use it in your code:
Questions
Does this have model performance issues? Its faster to batch, should we resize and batch in train as well? This seems like a clear tradeoff between speed and accuracy?
@jveitchmichaelis can you comment here and let's think about whether we need a change.
Beta Was this translation helpful? Give feedback.
All reactions