A lightweight, on-device iOS toolkit for intelligent photo processing, featuring high-performance background removal and advanced face detection.
Tired of slow, expensive server-side image processing APIs? So were we.
MMYPhotoKit is a powerful and lightweight iOS framework designed to perform complex image processing tasks directly on the user's device, with no need for an internet connection. It leverages the speed of the NCNN deep learning framework and the robustness of OpenCV to provide a seamless and private experience for your users.
Think of it as your on-device Swiss Army knife for common but complex photo tasks. At its core, it offers two main capabilities:
-
Background Removal: Using a pre-trained MobileNetV2 model, it can accurately segment a person from an image and replace the background with a solid color of your choice.
-
Face Analysis: Using Apple's native Vision framework, it can detect faces, count them, and even determine if the subject is facing the camera.
All of this happens in a blink of an eye, right on the iPhone or iPad, ensuring user data remains private and your app works flawlessly offline.
(A quick demo of the on-device background removal.)
output_audio_only.webm
-
🚀 On-Device Background Removal: Employs a MobileNetV2 model via NCNN to accurately segment portraits and replace the background. No API calls, no latency.
-
👨👩👧 Advanced Face Detection: Utilizes Apple's Vision framework to not only find faces but also count them and verify a frontal pose—perfect for profile picture validation.
-
⚡️ High-Performance C++ Core: The heavy lifting is done in a C++ backend powered by NCNN and OpenCV, ensuring maximum performance and efficiency.
-
🔒 Privacy-First: All processing happens locally. No images or user data ever leave the device.
-
🎨 Customizable Backgrounds: Easily specify any solid UIColor to be applied as the new background after segmentation.
-
🧩 Simple Objective-C API: A clean and straightforward Objective-C wrapper makes integration into any iOS project a breeze.
This project is built upon a foundation of powerful, industry-standard technologies:
-
NCNN - A high-performance neural network inference framework optimized for mobile platforms.
-
OpenCV - The leading open-source library for computer vision and image processing.
-
Vision Framework - Apple's native framework for powerful computer vision tasks.
-
Objective-C / C++
-
Xcode 12.0 or later
-
iOS 11.0 or later
MMYPhotoKit's architecture is structured to efficiently bridge Objective-C with high-performance C++ components, specifically integrating OpenCV and NCNN.
This segment houses the high-performance image processing logic utilizing C++, OpenCV, and NCNN.
MMYCVPhoto.hppdeclares theMMYCVPhotoC++ class, which serves as the interface for the underlying image processing operations. It includes essential headers for NCNN (e.g.,net.h,benchmark.h) and OpenCV (opencv2/core.hpp,opencv2/imgproc.hpp), signifying its reliance on these frameworks.MMYCVPhoto.cppcontains the implementation details:- Neural Network Object: A
static ncnn::Net bgnet;object is declared, representing the neural network responsible for background segmentation. - Model Initialization (
modelInit): This method is critical for configuring and loading the pre-trainedmobilenetv2model. It sets NCNNncnn::Optionparameters for optimization, such asopt.lightmode = trueandopt.num_threads = 4, indicating a configuration optimized for mobile execution. It then loads the model's structure (.paramfile) and weights (.binfile) intobgnet. A return value offalseindicates a failure in loading either component. - Photo Processing (
photoProcess): This method executes the background removal pipeline:- It accepts raw pixel data (
uint8_t* pixel), image dimensions (width,height), and an array of integers representing the desiredbackgroundcolor. - Input pixel data (RGBA format) is converted into an
ncnn::Matin RGB format. - The image is then resized to
512x512pixels usingncnn::Mat::from_pixels_resize. - Normalization: A crucial step involves normalizing the pixel values of the resized image using
in_resize.substract_mean_normalize(meanVals, normVals)with specific mean and norm values (127.5fand0.0078431frespectively). This normalization aligns the input data with the training parameters of the neural network. - An
ncnn::Extractoris instantiated frombgnet, the normalized input is fed viaex.input("input", in_resize), and the segmentation mask is extracted as "output" viaex.extract("output", out). - The resulting segmentation mask (
out) is then resized back to the original image dimensions using bilinear interpolation (ncnn::resize_bilinear) to generate thealphamask. - Pixel Blending: The core blending operation iterates through each pixel of the image. For each pixel, a new color is computed by blending the original pixel's color (
rgb.at<cv::Vec3b>(i, j)) with the newbackgroundcolor based on the correspondingalpha_value from the generated mask. The formula used isoriginal_pixel * alpha_ + (1 - alpha_) * background_color. - Finally, the method manages memory by releasing all temporary
ncnn::Matandcv::Matobjects. The blendedcv::Matis returned.
- It accepts raw pixel data (
- Neural Network Object: A
This Objective-C class acts as the primary interface between the iOS application and the underlying C++/OpenCV/NCNN computational layer.
- Initialization (
init): Upon instantiation, the manager attempts to locate and load themobilenetv2model files (.binand.param) from theModelRes.bundlewithin its own framework. It then calls themodelInitmethod of theMMYCVPhotoC++ object (self->photo). Aself.isLoadedproperty tracks the success of model loading. - Deallocation (
dealloc): When theMMYPhotoKitManagerinstance is released, itsdeallocmethod ensures the proper destruction of the C++MMYCVPhotoobject by calling its destructor (self->photo->~MMYCVPhoto()), preventing memory leaks. - Face Frontality Assessment (
isFaceFrontal): This private method utilizes Apple's Vision framework to evaluate the orientation of a detected face. It extractsroll,yaw, andpitchvalues from aVNFaceObservationand converts them to degrees. A face is considered "frontal" if itsrollDegrees,yawDegrees, andpitchDegreesare all within anacceptableRangeof +/- 10.0 degrees. - Face Occupancy Check (
isFaceOccupyingMajorSpace): This private method calculates the proportional area a detected face occupies within the image. While present, its direct application within the publicdetectFacemethod is not explicitly shown in the provided source. It also includes a check forisFaceFrontaland returnstrueif the face area is less than 50% and the angle is valid. - Face Detection Logic (
detectFace): This public method takes aUIImage, converts it to aCIImage, and usesVNDetectFaceRectanglesRequestfrom the Vision framework to identify faces. The completion handler analyzes therequest.resultsto determine the appropriateMMYFaceStatus:- It assigns
MMY_FS_NONEif no faces are found. - It assigns
MMY_FS_ONEif exactly one frontal face is detected by callingisFaceFrontal. - It assigns
MMY_FS_NOFRONTALif one face is detected butisFaceFrontalindicates it is not frontal. - It assigns
MMY_FS_EXCESSIVEif more than one face is detected. - Error logging is performed if the Vision request fails.
- It assigns
- Image Processing Bridge (
photoProcess): This method orchestrates the background removal process:- It first verifies that the NCNN model has been successfully loaded (
!self.isLoaded). - The input
UIImageis converted into a rawunsigned char* rgbapixel buffer using Core Graphics (CGContextRef). - The
backgroundColor(anUIColor) is decomposed into its individual RGB integer components. - The raw pixel data, dimensions, and background color are then passed to the C++
self->photo->photoProcessmethod. - Upon receiving the processed
cv::Matfrom the C++ layer, the manager converts it back into anUIImagesuitable for iOS display. This conversion involves creating aCGDataProviderRef,CGColorSpaceRef, andCGImageRef. - Thorough cleanup of all temporary Core Graphics and OpenCV resources is performed, including releasing
CGContextRef,CGImageRef,CGDataProviderRef,CGColorSpaceRef,CGColorSpaceRef(fromkCGColorSpaceSRGB), and deleting thergbapixel buffer.
- It first verifies that the NCNN model has been successfully loaded (
We've included a simple demo project, MMYPhotoKitSimpleDemo, to show you how to integrate and use the library in a modern SwiftUI application. It's the quickest way to see MMYPhotoKit in action.
First, an AppManager class acts as the view model, holding the state and interfacing with MMYPhotoKitManager.
The ContentView provides the UI, displaying the original and processed images, along with a button to trigger the background removal.
This library is a perfect fit for a variety of applications, including:
-
ID Photo Apps: Automatically replace the background of a user's photo with a solid white, blue, or red background as required for official documents.
-
E-commerce Product Photos: Allow users to quickly clean up product images by removing distracting backgrounds.
-
Creative Photo Editors: Use it as a foundational tool for creating stickers, memes, or artistic compositions.
-
Profile Picture Generators: Ensure all user profile pictures are clean, professional, and contain a valid face.
MMYPhotoKit provides a solid foundation. Here are some ideas for future development:
-
Video Processing: Extend the functionality to support real-time background removal in video streams.
-
Advanced Masking: Expose the alpha mask to allow for more advanced effects, like blurred or transparent backgrounds.
-
Swift Support: Create a Swift-friendly wrapper to improve integration with modern iOS projects.
-
Additional Models: Integrate other models for tasks like style transfer, super-resolution, or object detection.
License
Distributed under the MIT License. See LICENSE for more information.

