I have been testing SAM 3 on the official demo/codebase and noticed that the model seems insensitive to spatial or directional descriptors in text prompts.
When I provide a text prompt that includes a specific location (e.g., "the button in the bottom-left corner"), SAM 3 tends to segment all instances of that object class in the image, ignoring the positional constraint.