Skip to content

Request For Help: on exporting the SAM3 Vision Encoder to ONNX? (view_as_complex, SDPA, dynamic ops issues) #7

@Aravind-Sreenivas

Description

@Aravind-Sreenivas

Hi — thanks for your work on EfficientSAM3 (great project).
I’m trying to export the SAM3 vision encoder from the official Meta implementation into ONNX, mainly for TensorRT deployment. However, the vanilla SAM3 code fails ONNX export with errors like:

UnsupportedOperatorError: aten::view_as_complex

and several dynamic-shape / SDPA-related issues.

Where the vanilla SAM3 export currently fails

  • view_as_complex in rotary embeddings (not supported in ONNX)
  • dynamic asserts on shape (assert size * size == xy_num)
  • conditional padding logic (if pad_h > 0 or pad_w > 0)
  • SDPA → requires replacing with matmul attention
  • dynamic cumsum / dynamic position encoding

My Question to the EfficientSAM3 team

Since your repo deals deeply with SAM models and modifications, could you guide me on how to make SAM3’s vision encoder ONNX-exportable?

Specifically:

What is the correct approach to handle these?

  1. How to remove/replace view_as_complex?
    Should rotary embedding be rewritten using real-valued sin/cos?
  2. Should SDPA be replaced with a manual matmul attention block?
  3. How to replace dynamic cumsum → static arange?
  4. How to precompute/register positional embeddings as buffers?
  5. Do FPN layers or padding need to be rewritten?
  6. Any tips from your experience exporting SAM/SAM2/EfficientSAM variants?

Even a high-level guide or a minimal patch would be extremely helpful.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions