Swift port of Meta's DINOv3 on MLX Swift.
DINOv3 is a self-supervised vision model from Meta that produces dense visual features useful for classification, segmentation, retrieval, etc. without fine-tuning. This package implements the architecture in MLX and validates outputs against a PyTorch reference.
Add to your Package.swift:
dependencies: [
.package(url: "https://github.com/vincentamato/MLXDINOv3.git", from: "1.0.0")
]import MLXDINOv3The Convert target downloads a Hugging Face checkpoint and converts it to MLX format. Only ViT models are supported for now.
xcodebuild build -scheme Convert -destination platform=macOS -derivedDataPath .build/xcode && \
.build/xcode/Build/Products/Release/Convert \
facebook/dinov3-vits16-pretrain-lvd1689m \
./Models/dinov3-vits16-mlximport AppKit
import MLX
import MLXDINOv3
let model = try loadPretrained(modelPath: "Models/dinov3-vits16-mlx")
let image = NSImage(contentsOfFile: "image.jpg")!
let processor = ImageProcessor()
let inputs = try processor(image)
let outputs = model(inputs)
print("Pooler output shape:", outputs.poolerOutput.shape)
print("Last hidden state shape:", outputs.lastHiddenState.shape)Tests use xcodebuild because MLX depends on the Metal backend (swift test won't work). Before running tests, you need to convert the model into the test resources directory.
# Convert the test model (skip if already done)
xcodebuild build -scheme Convert -destination platform=macOS -derivedDataPath .build/xcode && \
.build/xcode/Build/Products/Release/Convert \
facebook/dinov3-vits16-pretrain-lvd1689m \
Tests/MLXDINOv3Tests/Resources/Model
# Run tests
xcodebuild test -scheme MLXDINOv3Tests -destination platform=macOSTests download PyTorch reference outputs from Hugging Face and compare against them.
MIT. See LICENSE.
Pretrained weights are under Meta's DINOv3 License.