This project represents a comprehensive solution for salient object detection and background manipulation. It leverages both state-of-the-art Deep Learning models and classical Computer Vision techniques to provide a versatile tool for subject highlighting and background replacement.
The application provides a user-friendly interface powered by Streamlit, allowing users to upload images and apply various segmentation techniques.
- U-Net with VGG16 Backbone: A custom-trained U-Net model designed for high-precision salient object detection. The encoder uses a pre-trained VGG16 (up to layer
conv4_3) to extract robust features, while the decoder reconstructs the segmentation mask. This model is particularly effective on datasets like DUTS. - Mask R-CNN: Uses a pre-trained ResNet50-FPN model from
torchvisionfor instance segmentation. It can detect multiple objects and provide high-quality masks.
For comparison and specific use-cases, the tool also includes:
- GrabCut: Interactive foreground extraction using iterated graph cuts.
- Otsu Thresholding: Automatic global thresholding based on histogram analysis.
- Watershed (scikit-image): Marker-based segmentation treating the image pixel gradient as a topographic surface.
- Canny Edge Detection: Edge-based segmentation combined with morphological operations.
- K-Means Clustering: Unsupervised segmentation by clustering pixel colors.
- Subject Highlighting: Draws contours or overlays masks on the detected subject.
- Background Blurring (Bokeh Effect): Automatically blurs the background while keeping the subject sharp.
- Background Desaturation: Converts the background to grayscale/black-and-white while preserving the subject's original colors.
- Export: Results can be downloaded directly from the interface.
main.py: The entry point for the Streamlit application. Handles UI, model loading, and interaction logic.model.py: Defines theSaliencyModel,SimpleEncoder(VGG16), andSimpleDecoderarchitecture.utils.py: Contains helper functions for image processing (blend_background,refine_mask) and implementations of classical segmentation algorithms (apply_grabcut,segment_otsu, etc.).unet.pth: The trained weights for the custom U-Net model from the DUTS dataset.requirements.txt: List of Python dependencies.
-
Clone the repository (if applicable) or navigate to the project directory.
-
Install Dependencies: Ensure you have Python installed (recommended version 3.8+). Run the following command:
pip install -r requirements.txt
-
Run the Application: Start the Streamlit server:
streamlit run main.py
-
Access the Tool: The application will automatically open in your default web browser at
http://localhost:8501.
- Upload Image: Use the file uploader to select an image (
.jpg,.png,.jpeg,.bmp). - Select Mode: Choose between "Image segmentation" (Classical) and "Deep Learning" in the sidebar.
- Choose Method: Select the specific algorithm or model you want to use.
- View & Download: The processed images (Mask, Highlighted, Blurred Background, Grey Background) will be displayed. Click the "Download" button to save them.
The SaliencyModel is a U-Net architecture:
- Encoder: VGG16 with Batch Normalization (pretrained on ImageNet). Layers are used up to
features[22]. - Decoder: A series of
ConvTranspose2dlayers with ReLU activation, progressively upsampling the feature map to the original resolution. A finalSigmoidactivation produces the probability map.
- The application uses a GPU if available (
cuda), otherwise falls back to CPU. - The
unet.pthfile must be present in the root directory for the U-Net model to work.