This is the official repository of the paper "RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models", by Ronald Fecso, José Morano, Ursula Schmidt-Erfurth, and Hrvoje Bogunović, accepted for presentation at MICCAI 2025.
We propose RetFiner (Fig. 1), an SSL vision-language refinement scheme that improves the representations of existing FMs and enables their efficient and direct adaptation to specific populations for improved downstream performance. Our method uses a diverse set of training objectives which take advantage of the rich supervisory signal found in textual data. We tested RetFiner on the retinal FMs RETFound, UrFound, and VisionFM (Table 1), showing significant improvements in linear probing performance on seven highly diverse OCT classification tasks, with an average increase of 5.7, 3.9, and 2.1 percentage points over their baselines, respectively .
The model weights are available in the Model weights release on GitHub.
| Model | Link |
|---|---|
| RetFiner-RETFound | Weights-RetFiner-R |
| RetFiner-UrFound | Weights-RetFiner-U |
| RetFiner-VisionFM | Weights-RetFiner-V |
Our models can also be easily accessed on Huggingface:
| Model | Link |
|---|---|
| RetFiner-RETFound | Weights-RetFiner-R |
| RetFiner-UrFound | Weights-RetFiner-U |
| RetFiner-VisionFM | Weights-RetFiner-V |
If you want to run RetFiner on your vision model:
Navigate into RetFiner/
Create a new virtual environment in RetFiner/ and install requirements.txt
Text encoder weights: Download BERT model and tokenizer and unzip them RetFiner/pretrained_weights/BERT/.
Vision encoder weights: Put your vision model in RetFiner/
Our in-house image-text training data is private so you will need to use your own. Edit the dataloader in RetFiner/ImageCaptionDataset.py accordingly. getitem should return a list consisting of two elements: an image (torch tensor) and a report (string).
Then in the command line run:
python train.py --model_weights path/to/yourvisionmodelOnce your model is trained, run the following script to extract the vision backbone. This will save it under ../linear_probing/_weights. Note this has only been tested on RETFound, VisionFM, UrFound, and our in-house MAE. You may need to alter it for another FM.
python get_vision_backbone_for_linprobing.py --path_to_model models/<model name>/best-model.ckptOnce you have your RetFined model, navigate into ../linear_probing/, set up a new virtual environment there, and then activate it. Then install requirements.txt.
Then you can run one of the .sh scripts based on which model you have.
For example, in retfound.sh, you would change the ft_weights arg to _weights/<my_model_name>. Adjust the data sets arg accordingly.
Results are found in __results/.
- Duke iAMD: https://people.duke.edu/~sf59/RPEDC_Ophth_2013_dataset.htm
- Harvard Glaucoma: https://github.com/Harvard-Ophthalmology-AI-Lab/Harvard-GDP
- Noor Eye Hospital: https://hrabbani.site123.me/available-datasets/dataset-for-oct-classification-50-normal-48-amd-50-dme
- OCTDL: https://data.mendeley.com/datasets/sncdhf53xc/4
- OCTID: https://borealisdata.ca/dataverse/OCTID
- NEHUT: https://data.mendeley.com/datasets/8kt969dhx6/1
The models and associated code are released under the CC-BY-NC 4.0 license and may only be used for non-commercial, academic research purposes with proper attribution. See LICENSE for more details.
Specifically, the following licenses apply to the base models and fine-tuned models:
| Model Name | Base Model | Original License | Fine-Tuned License |
|---|---|---|---|
| RetFiner-U | UrFound | MIT | CC-BY-NC-4.0 |
| RetFiner-R | RETFound | CC-BY-NC-4.0 | CC-BY-NC-4.0 |
| RetFiner-V | VisionFM | CC-BY-NC-4.0 | CC-BY-NC-4.0 |
If you use any of our models, please do the following:
- Cite the original base models:
- UrFound: Yu, Kai, et al. "UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.
- RETFound: Zhou, Yukun, et al. "A foundation model for generalizable disease detection from retinal images." Nature 622.7981 (2023): 156-163.
- VisionFM: Qiu, Jianing, et al. "Development and validation of a multimodal multitask vision foundation model for generalist ophthalmic artificial intelligence." NEJM AI 1.12 (2024): AIoa2300221.
- Cite this work:
@InProceedings{FecRon_RetFiner_MICCAI2025, author = { Fecso, Ronald and Morano, José and Schmidt-Erfurth, Ursula and Bogunović, Hrvoje}, title = { { RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models } }, booktitle = {proceedings of Medical Image Computing and Computer Assisted Intervention -- MICCAI 2025}, year = {2025}, publisher = {Springer Nature Switzerland}, volume = {LNCS 15964}, month = {September}, page = {543 -- 553} }


