Skip to content

Training Errors and Results Mismatch #3

@Joan947

Description

@Joan947

Hi,

Thank you for releasing the code. Your is amazing and since it aligns with my research work. I am new in this area and I tried to reproduce the results on 4 H100 GPUS but I am facing the following issues:

  1. Traceback (most recent call last):   File "/home/jowusu1/.local/lib/python3.12/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap     fn(i, *args)   File "/home/jowusu1/.local/lib/python3.12/site-packages/detectron2/engine/launch.py", line 123, in _distributed_worker     main_func(*args)   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 429, in main     trainer = Trainer(cfg)               ^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 75, in init     model = self.build_model(cfg)             ^^^^^^^^^^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/train_net.py", line 114, in build_model     model = build_sam2_video_query_iou_predictor(model_cfg, sam2_checkpoint,mode='train',apply_postprocessing=False, mask_decoder_depth=cfg.MODEL.MASK_DECODER_DEPTH)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/cluster/medbow/project/advdls25/jowusu1/EntitySAM/entitysam/sam2/build_sam.py", line 168, in build_sam2_video_query_iou_predictor     model = instantiate(cfg.model, recursive=True)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 226, in instantiate     return instantiate_node(            ^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 347, in instantiate_node     return _call_target(target, partial, args, kwargs, full_key)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^   File "/home/jowusu1/.local/lib/python3.12/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 97, in _call_target     raise InstantiationException(msg) from e hydra.errors.InstantiationException: Error in call to target 'sam2.sam2_video_query_iou_predictor.SAM2VideoQueryIoUPredictor': ModuleNotFoundError("No module named 'dinov2.hub'")

So i stopped using torch.hub and used timm’s DINOv2 to ship DINOv2 backbones with the right weights, training and evaluation happened, but then comes point 2 below.

  1. Evaluation results for VIPEntitySeg (Table 2 in the paper) is down by 2% across all metrics.

Is there any additional adjustments or parameter tuning to be made? Is there any reason why I having the errors in point 1? I made sure I followed all the installation instructions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions