We introduce NExNet Seg, the Neuron Expansion Network for Medical Image Segmentation. Inspired by Progressively Expanded Neu- ron (PEN) structures and Manhattan Self-Attention (MaSA) mechanisms, NExNet Seg achieves exceptional accuracy with high parameter efficiency.
NExNet Seg is a novel deep learning architecture designed for efficient and accurate medical image segmentation, as presented in the LXCV-CVPRw 2025 submission (Paper ID 3). It integrates Trainable Progressively Expanded Neurons (T-PEN), Manhattan Self-Attention (MaSA), and Self-Supervised Learning (SSL) to achieve superior performance with reduced computational cost. The model excels in tasks such as skin lesion segmentation (e.g., ISIC datasets) and gastrointestinal polyp segmentation (e.g., Kvasir-Seg, CVC-Clinic DB), balancing high accuracy with computational efficiency for clinical deployment.
This repository contains the implementation of NExNet Seg, including training scripts and configurations for reproducing the results reported in the paper.
- T-PEN: Enhances feature representation by expanding neurons with trainable coefficients, improving adaptability over the original PEN framework.
- MaSA: Utilizes Manhattan Self-Attention to capture spatially relevant features efficiently, reducing computational overhead.
- SSL: Leverages self-supervised pretraining with a U-Net-based denoising model to improve generalization.
- Efficiency: Achieves high Dice Coefficients (e.g., 0.943 on Kvasir-Seg) with only 30.4M parameters, compared to TransUNet's 105M.
- Applications: Demonstrates robust performance on skin lesion (ISIC 2016-2018, PH²) and polyp segmentation (Kvasir-Seg, CVC-Clinic DB).
The NExNet Seg architecture combines T-PEN and MaSA within a U-Net-like encoder-decoder framework. The encoder uses Neuron Expansion (NEx) blocks with T-PEN to enrich feature representations, while MaSA enhances spatial feature extraction in Conv + MaSA blocks. Skip connections integrate MaSA features into the decoder for accurate segmentation mask reconstruction.
Illustration of the proposed NExNet Seg architecture for medical image segmentation. (a) NExNet Seg combines T-PEN and MaSA concepts. (b) T-PEN is embedded into the NEx blocks, while MaSA is combined with traditional convolution within the Conv MaSA block.
- Python 3.8+
- PyTorch with CUDA support
- NVIDIA GPU (e.g., A-100 with 40GB for training)
- Anaconda for environment management
- Datasets: ISIC 2016-2018, PH², Kvasir-Seg, CVC-Clinic DB
-
Clone the Repository:
git clone https://github.com/MAIN-Lab/NExNet_Seg.git cd NExNet_Seg -
Create Conda Environment:
conda create -n upen_torch python=3.8 conda activate upen_torch
-
Install Dependencies:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu122 pip install -r requirements.txt
The repository includes scripts for training the autoencoder (SSL pretraining) and the NExNet Seg model. Example commands are provided below.
- For skin lesion datasets:
sbatch auto_ssl_skin.sh
- For polyp datasets:
sbatch auto_ssl_colon.sh
- For polyp segmentation (e.g., Kvasir-Seg):
sbatch nexnet_train_colon.sh
Modify the scripts to adjust hyperparameters such as --num_epochs, --batch_size, --dropout_rate, or --gamma as needed.
Evaluate the model using the provided datasets and metrics (Dice Coefficient, IoU). The trained models can be tested with:
python evaluate.py --model_path <path_to_trained_model> --dataset <dataset_name>NExNet Seg outperforms several state-of-the-art models in medical image segmentation, as shown in the table below (extracted from Table 1 in the paper).
| Method | ISIC 16 (DC) | ISIC 17 (DC) | ISIC 18 (DC) | PH² (DC) | Kvasir-Seg (DC) | CVC-Clinic (DC) |
|---|---|---|---|---|---|---|
| UNet | 0.887 | 0.794 | 0.813 | 0.873 | 0.775 | 0.856 |
| UNet++ | 0.889 | 0.814 | 0.833 | 0.890 | 0.786 | 0.874 |
| TransUNet | 0.913 | 0.873 | 0.904 | 0.910 | 0.889 | 0.920 |
| SegFormer | - | 0.851 | 0.869 | - | 0.905 | 0.931 |
| NExNet Seg | 0.913 | 0.921 | 0.918 | 0.936 | 0.943 | 0.934 |
Qualitative comparison of NExNet Seg with state-of-the-art methods. Blue contours represent ground truth, green contours represent predictions.
The ablation study (Table 2) demonstrates the contribution of each component (T-PEN, MaSA, SSL). The full model with all components achieves the highest performance across datasets.
Detailed examination of the proposed methodology. (a) Illustrates the performance capability of NExNet Seg with fewer
trainable parameters in comparison with the state-of-the-art. [Different markers represent compared models, and different colors represent
the dataset used in experiments] (b) Ablation analysis performance comparison.
NExNet Seg is designed for resource-constrained environments, with fewer parameters and lower inference times compared to models like TransUNet (Table 4).
| Method | Parameters (M) | Inference Time (s) | FLOPS (G) |
|---|---|---|---|
| UNet | 31.1 | 0.09 | 68 |
| TransUNet | 105 | 0.30 | 160 |
| NExNet Seg | 30.4 | 0.06 | 18 |
- Overfitting: T-PEN layers may overfit on certain datasets (e.g., ISIC17, CVC-Clinic).
- Generalization: Performance on other medical imaging tasks is untested.
- Hardware: Training requires high-end GPUs (e.g., NVIDIA A-100).
- Explore regularization techniques to mitigate T-PEN overfitting.
- Test NExNet Seg on additional medical imaging tasks.
- Optimize for lower-end hardware to improve accessibility.
If you use NExNet Seg in your research, please cite:
@InProceedings{Reyes-Angulo_2025_CVPR,
author = {Reyes-Angulo, Abel and Paheding, Sidike},
title = {NExNet Seg: Neuron Expansion Network for Medical Image Segmentation},
booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops},
month = {June},
year = {2025},
pages = {1-10}
}Happy quokka.
See the LICENSE file for license rights and limitations (MIT).

