Questions about encoder proposal selection

Hi @ZhuYuChenNO1 @ChengShiest ,

Thanks for this excellent work! I have some questions about the model design as follows:

1. As shown in [this line](https://github.com/SooLab/Plain-Det/blob/ca4bda1e51d99d1ef07230ed1616fd4c377f1a9e/projects/deformable_detr/modeling/deformable_transformer.py#L434), the object proposals are selected according to the similarity to the first class (like "person" in the coco dataset). I think these class-agnostic proposals follow the design of the original deformable-detr++, but in PlainDet, the class_embed for proposals is not learnable (which is initialized with text embedding and frozen). Would you be able to provide some explanations here? Have you tried using maximum scores among all classes to select proposals as commented in the next line?
2. About the procedure of building query embeddings, the description in the paper, like Equation 7, is different from the code. 
3. What is the difference between PlainDet and typical open-vocabulary object detection regarding the training paradigm? 

Thanks! Looking forward to your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about encoder proposal selection #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about encoder proposal selection #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions