Skip to content

Getting very low confidence score for certain obvious text prompts #17

@rajeshgangireddy

Description

@rajeshgangireddy

Hi, I have been trying out different things with EfficientSAM3 and most of the times it works great. Thanks for the great work.

However, I have noticed that when using distilled image encoder + text encoder, there are sometimes very low confidence score masks for simple prompt and images.

Using the sam3/efficientsam3_examples/efficientsam3_image_predictor_example.ipynb notebook,
with the provided example image and prompt ("a shoe") , the results are quite good.

Image

But with a different image of a dog and text prompt "dog", I had to lower the confidence threshold by a lot to get any meaningful masks.

dog_image ="dog6/01.jpg"
image = Image.open(dog_image)
width, height = image.size
processor = Sam3Processor(model, confidence_threshold=0.02)
inference_state = processor.set_image(image)

processor.reset_all_prompts(inference_state)
inference_state = processor.set_text_prompt(state=inference_state, prompt="dog")

img0 = Image.open(dog_image)
plot_results(img0, inference_state)
Image

After taking a closer look, I see that this is mostly due to the presence_logit_dec being too low, even for simple cases like this.

Prompt: 'dog'
Presence score (single value): 0.0320
Top-5 classification probs: [0.7695, 0.4902, 0.3438, 0.332, 0.2734]
Top-5 final probs (class × presence): [0.0247, 0.0156, 0.011, 0.0106, 0.0087]

Question/Discussion :

  1. Is this expected and is an effect of distillation or the dataset used for distillation itself?
  2. Perhaps EfficientSam3 need a less aggressive function to be applied on presence_logit_dec. Currently it is sigmoid which leads to presence probability being pushed to either extremes.
    Have you already considered it experiments around this ?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions