Add MPS support for image inference on macOS (Apple Silicon)#400
Add MPS support for image inference on macOS (Apple Silicon)#400Shreesh-Coder wants to merge 3 commits intofacebookresearch:mainfrom
Conversation
This commit adds CPU/MPS compatibility fixes to enable SAM3 to run on macOS without CUDA. These changes address device mismatch errors and hardcoded CUDA references that prevent the model from working on non-GPU systems. Key Changes: - geometry_encoders.py: Conditional pin_memory() to avoid MPS device errors - position_encoding.py: CPU-aware device selection for position encoding - decoder.py: CPU-aware coordinate caching and autocast device handling - edt.py: OpenCV fallback for distance transform when Triton unavailable - sam3_tracking_predictor.py: CPU-compatible autocast configuration - model_builder.py: Improved device setup with better error handling Fixes: - Resolves 'pin_memory()' errors on MPS (similar to SAM2 PR #495) - Fixes device mismatch: 'Attempted to set storage on cpu to mps:0' - Enables CPU fallback for GPU-specific operations (Triton → OpenCV) - Prevents segmentation faults from hardcoded CUDA operations Testing: - Successfully tested on macOS with PyTorch 2.9.1 (CPU mode) - Model loads and runs inference without CUDA - Compatible with existing CUDA workflows (backward compatible) --- AI-GENERATED CODE DISCLAIMER: This code was developed with assistance from AI (Claude/Cursor). The modifications are compatibility patches for macOS/CPU usage and have been tested, but users should: 1. Review all changes before deploying to production 2. Test thoroughly in their specific environment 3. Be aware that CPU performance is slower than GPU 4. Understand that some operations use fallback implementations 5. Note that these are compatibility patches and do not alter core model architecture The original SAM3 codebase is from Facebook Research: https://github.com/facebookresearch/sam3 These modifications maintain backward compatibility with CUDA while enabling CPU/MPS support for development and testing on macOS systems.
- Add centralized device selection helper (get_device) with auto-detection (CUDA → MPS → CPU) - Thread device parameter through model construction (position encoders, geometry encoders, decoder caches) - Add MPS compatibility fixes: - Disable bfloat16 autocast on MPS (not well supported) - Add CPU round-trip fallback for grid_sample on MPS - Fix position encoding and decoder coordinate cache device placement - Replace _assert_async with regular assertions on MPS - Add OpenCV fallback for EDT on non-CUDA devices - Make decord optional with clear error messages - Add graceful error handling for video/tracking on non-CUDA (raises NotImplementedError) - Add smoke test script for macOS validation Video/tracking remains CUDA-only. Image inference works on CPU and MPS. AI-Generated Code Disclaimer: This PR contains code changes that were generated with the assistance of AI tools (Cursor AI). All changes have been reviewed, tested, and validated. Tested on: - macOS 26.1 (Build 25B78) - Apple Silicon (arm64) - PyTorch 2.9.1 - Apple M2 All smoke tests pass on CPU and MPS.
|
Hi @Shreesh-Coder! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
|
Is this going to be added to main? |
|
Confirmed working on M4 |
|
Works for me on my M1 Max too. Would be nice to see this merged! |
Add MPS support for image inference on macOS (Apple Silicon)
AI-Generated Code Disclaimer
Note: This PR contains code changes that were generated with the assistance of AI tools (Cursor AI). All changes have been reviewed, tested, and validated. The implementation follows PyTorch best practices and patterns similar to those used in SAM2 for MPS compatibility.
Summary
This PR enables running SAM3 image inference on macOS using the PyTorch MPS backend (Apple Silicon GPU) or CPU, with automatic device selection (CUDA → MPS → CPU). CUDA behavior is unchanged. Video/tracking remains CUDA-only and raises a clear
NotImplementedErroron non-CUDA devices.Key Changes
Core Device Support
get_device()helper inmodel_builder.pythat auto-detects device (CUDA → MPS → CPU)build_sam3_image_model()to support MPS device selectionMPS Compatibility Fixes
Optional Dependencies
Graceful Error Handling
NotImplementedErrorwith helpful messages when video/tracking is attempted on non-CUDA devicesTesting
scripts/smoke_macos.pyfor lightweight validation on macOSFiles Modified
Core Library (sam3/sam3/)
model_builder.py: Added device selection helpers, MPS support in device setupmodel/position_encoding.py: Added device parameter, fixed cache device placementmodel/decoder.py: Fixed coordinate cache device detection, improved autocast device detectionmodel/geometry_encoders.py: Added MPS-safe grid_sample fallback, fixed _assert_asyncmodel/edt.py: Improved OpenCV fallback for non-CUDA devicesmodel/sam3_tracking_predictor.py: Disabled bfloat16 autocast on MPSmodel/sam3_tracker_base.py: Added device check in device property (raises error for non-CUDA)model/sam3_video_predictor.py: Added CUDA check before model constructionmodel/sam3_image.py: Fixed _assert_async for MPS compatibilitymodel/utils/sam2_utils.py: Added decord import error handlingtrain/data/sam3_image_dataset.py: Added decord availability checktrain/loss/mask_sampling.py: Added MPS-safe grid_sample fallbacktrain/loss/loss_fns.py: Fixed _assert_async for MPS compatibilityTesting
scripts/smoke_macos.py: New smoke test script for macOS validationValidation
Test Environment
Test Results
Performance Notes
Limitations
Video/Tracking: Currently requires CUDA. Attempting to use video/tracking on CPU or MPS will raise a clear
NotImplementedErrorwith guidance to use image inference instead.MPS Operation Coverage: Some operations (like
grid_sample) have incomplete MPS implementation and require CPU fallback. This is handled automatically via CPU round-trips. For additional unsupported operations, users may need to setPYTORCH_ENABLE_MPS_FALLBACK=1environment variable (per PyTorch MPS documentation). For example,aten::grid_sampler_3dis not implemented on MPS and PyTorch suggests usingPYTORCH_ENABLE_MPS_FALLBACK=1as a temporary fix.Autocast: bfloat16 autocast is disabled on MPS (not well supported). This may result in slightly different numerical outputs compared to CUDA, but results are still accurate.
Backward Compatibility
Testing Checklist
Usage Example
References