Training Errors and Results Mismatch

Hi,

Thank you for releasing the code. Your is amazing and since it aligns with  my research work. I am new in this area and I tried to reproduce the results on 4 H100 GPUS but I am facing the following issues:

1. Traceback (most recent call last):   File "/home/jowusu1/.local/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap     fn(i, *args)   File "/home/jowusu1/.local/lib/python3.12/site-packages/detectron2/engine/launch.py", line 123, in _distributed_worker     main_func(*args)   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 429, in main     trainer = Trainer(cfg)               ^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 75, in __init__     model = self.build_model(cfg)             ^^^^^^^^^^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 114, in build_model     model = build_sam2_video_query_iou_predictor(model_cfg, sam2_checkpoint,mode='train',apply_postprocessing=False, mask_decoder_depth=cfg.MODEL.MASK_DECODER_DEPTH)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/sam2/build_sam.py", line 168, in build_sam2_video_query_iou_predictor     model = instantiate(cfg.model, _recursive_=True)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate     return instantiate_node(            ^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node     return _call_target(_target_, partial, args, kwargs, full_key)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target     raise InstantiationException(msg) from e hydra.errors.InstantiationException: Error in call to target 'sam2.sam2_video_query_iou_predictor.SAM2VideoQueryIoUPredictor': ModuleNotFoundError("No module named 'dinov2.hub'")


So i stopped using torch.hub and  used timm’s DINOv2 to ship DINOv2 backbones with the right weights, training and evaluation happened, but then comes point 2 below.

2. Evaluation results for VIPEntitySeg (Table 2 in the paper) is down by 2% across all metrics.
 
 Is there any additional adjustments or parameter tuning to be made? Is there any reason why I having the errors in point 1? I made sure I followed all the installation instructions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training Errors and Results Mismatch #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training Errors and Results Mismatch #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions