There are three steps to train the classification network with selected high-quality pseudo labels.
Download the HPH dataset here
Download the LC25K dataset here
Download the CRC100K dataset here
Download the DigestPath dataset here
Using a 4:1 split for training and testing.
First, use the on-the-shelf VLM for zero-shot inference with our proposed method to filter out noisy samples on the training set.
In the vlm_cpl_LC25K.py file, there are two main functions, MVC
andPrompt_feature_consensus
.
You can use the combination of MVC
andPrompt_feature_consensus
or either one alone. You can also adjust the order of these two filters.
python vlm_cpl_LC25K.py --gpu 0
Second, after obtaining high-quality pseudo-labels, you can train a classification network.
python train_pseudo.py --gpu 0 --pseudo_csv <your_csv>