This repository contains the code for the article "EyeCLIP: A Visual-Language Foundation Model for Multi-Modal Ophthalmic Image Analysis". The paper presents a novel approach for leveraging vision-language pretraining to enhance ophthalmic image analysis across multiple modalities. You can read the full article here.
EyeCLIP builds upon the CLIP (Contrastive Language-Image Pretraining) framework, adapting it specifically for ophthalmic image analysis. It is designed to:
- Integrate and analyze multiple ophthalmic imaging modalities (e.g., Fundus, OCT, FA, ICGA, etc.).
- Perform zero-shot and fine-tuned classification for both ophthalmic and systemic diseases.
- Enable cross-modal retrieval between images and textual descriptions.
The code in this repository is largely based on the publicly available implementation from the original CLIP paper: Learning Transferable Visual Models From Natural Language Supervision.
@misc{radford2021learningtransferablevisualmodels,
title={Learning Transferable Visual Models From Natural Language Supervision},
author={Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
year={2021},
eprint={2103.00020},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2103.00020},
}To set up the environment, please follow the installation instructions provided in the official CLIP repository: https://github.com/openai/CLIP.
Ensure that you have the following dependencies installed:
- Python 3.8+
- PyTorch (with CUDA support for GPU training)
- OpenAI CLIP package
To prepare the dataset for pretraining and downstream tasks, follow these steps:
-
Download the Dataset
- Use the links provided in the article to download the publicly available ophthalmic imaging datasets.
-
Organize the Data
- Ensure the dataset is structured as follows:
dataset_root/ ├── images/ │ ├── image1.jpg │ ├── image2.jpg │ ├── ... ├── labels.csv - The
labels.csvfile should be formatted with at least the following columns:where:impath,class /path/to/image1.jpg,0 /path/to/image2.jpg,1
impath: Absolute path to the image file.class: Integer label representing the class of the image.
- Ensure the dataset is structured as follows:
To pretrain the EyeCLIP model on ophthalmic image datasets, run the following command:
python CLIP_ft_all_1enc_all.pyEyeCLIP can be used for various ophthalmic image analysis tasks. Below are the available downstream tasks with corresponding scripts to run them.
To evaluate the pretrained model without fine-tuning, run:
python zero_shot.pyTo fine-tune the model for ophthalmic disease classification, run:
bash scripts/cls_opthal.shTo fine-tune the model for systemic disease classification, run:
bash scripts/cls_chro.shTo perform cross-modal retrieval, run:
python retrieval.pyIf you use this repository or find our work helpful, please consider citing our paper:
@article{your_paper_citation,
title={EyeCLIP: A Visual-Language Foundation Model for Multi-Modal Ophthalmic Image Analysis},
author={Your Name and Co-Authors},
journal={arXiv},
year={2024},
url={https://arxiv.org/pdf/2409.06644}
}