-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Hi — thanks for your work on EfficientSAM3 (great project).
I’m trying to export the SAM3 vision encoder from the official Meta implementation into ONNX, mainly for TensorRT deployment. However, the vanilla SAM3 code fails ONNX export with errors like:
UnsupportedOperatorError: aten::view_as_complex
and several dynamic-shape / SDPA-related issues.
Where the vanilla SAM3 export currently fails
view_as_complexin rotary embeddings (not supported in ONNX)- dynamic asserts on shape (
assert size * size == xy_num) - conditional padding logic (
if pad_h > 0 or pad_w > 0) - SDPA → requires replacing with matmul attention
- dynamic cumsum / dynamic position encoding
My Question to the EfficientSAM3 team
Since your repo deals deeply with SAM models and modifications, could you guide me on how to make SAM3’s vision encoder ONNX-exportable?
Specifically:
What is the correct approach to handle these?
- How to remove/replace
view_as_complex?
Should rotary embedding be rewritten using real-valued sin/cos? - Should SDPA be replaced with a manual matmul attention block?
- How to replace dynamic cumsum → static arange?
- How to precompute/register positional embeddings as buffers?
- Do FPN layers or padding need to be rewritten?
- Any tips from your experience exporting SAM/SAM2/EfficientSAM variants?
Even a high-level guide or a minimal patch would be extremely helpful.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels