Hi @ZhuYuChenNO1 @ChengShiest ,
Thanks for this excellent work! I have some questions about the model design as follows:
- As shown in this line, the object proposals are selected according to the similarity to the first class (like "person" in the coco dataset). I think these class-agnostic proposals follow the design of the original deformable-detr++, but in PlainDet, the class_embed for proposals is not learnable (which is initialized with text embedding and frozen). Would you be able to provide some explanations here? Have you tried using maximum scores among all classes to select proposals as commented in the next line?
- About the procedure of building query embeddings, the description in the paper, like Equation 7, is different from the code.
- What is the difference between PlainDet and typical open-vocabulary object detection regarding the training paradigm?
Thanks! Looking forward to your reply!
Hi @ZhuYuChenNO1 @ChengShiest ,
Thanks for this excellent work! I have some questions about the model design as follows:
Thanks! Looking forward to your reply!