SpectraViT is a novel deep learning model designed to enhance melanoma classification through a combination of Fourier transforms, Wavelet transforms, and Vision Transformers (ViT). The model combines the spectral analysis capabilities of Fourier and Wavelet transforms with the attention-driven feature extraction of Vision Transformers to capture both fine-grained and holistic details from dermatoscopic images of skin lesions.
The goal of SpectraViT is to improve the accuracy of binary classification of melanoma by:
- Utilizing spectral transformations to capture high-frequency details and textures.
- Leveraging the Vision Transformer’s capacity for capturing global dependencies within an image.
- Combining these techniques to create a hybrid model that performs both low-level feature extraction and high-level contextual analysis.
SpectraViT integrates Fourier and Wavelet transforms as preprocessing steps, with a Vision Transformer layer to aggregate both local and global information from the images. The main steps in SpectraViT are as follows:
- Fourier Transform Layer: Extracts high-frequency components of the image, enhancing the model’s ability to identify textural details.
- Wavelet Transform Layer: Performs a multi-level decomposition (Level 2 Haar wavelet) to capture spatial-frequency features at different resolutions.
- Hybrid Pooling: Combines outputs from the Fourier and Wavelet transforms to reduce dimensions while retaining essential features.
- Vision Transformer: Processes the spectral information as well as spatial context, enhancing the model’s ability to understand the structure and distribution of image patterns.
- Accuracy: SpectraViT achieves an accuracy of 92%* on the melanoma classification dataset, showing a 2% improvement over standard ViT.
- Robustness: The model is resilient to noise and effectively captures both fine and coarse details, making it suitable for medical imaging tasks where precision is crucial.
For questions or feedback, please contact at samridhiraj04@gmail.com.