Official repo of Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification, ICCV 2023. [arXiv] [Slide] [Oral]
Official repo of Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis, IJCV 2025. [arXiv] [Springer Nature] [Huggingface Dataset]
- 2025-12: MHIM-v2 is accepted by IJCV 2025. Released code of MHIM-v2.
- 2025-09: Released MHIM-v2: a more concise and effective method, with stronger and broader generalizability, and enhanced interpretability. [arxiv]
- 2023-07: MHIM-MIL is accepted by ICCV 2023, and selected as an oral presentation.[arXiv] [Slide] [Oral]
| Branch Name | Link |
|---|---|
| 2.0 Version Branch | master (latest) |
| 1.0 Version Branch | 1.0 Version |
We recommend using Docker for a reproducible environment. Alternatively, you can install dependencies via PyPI.
- Download the Docker Image from Google Drive or Baidu Netdisk (Password: 2025)
- Load the Docker image:
(Replace
docker load -i XXX.tar
XXX.tarwith the downloaded file name.) - Run the Docker container:
docker run --gpus all -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864\ -v /path/to/your_code:/workspace/code \ -v /path/to/your_data:/workspace/dataset \ -v /path/to/your_output:/workspace/output \ --name mhim \ --runtime=nvidia \ -e NVIDIA_VISIBLE_DEVICES=all \ -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \ -d mhim:latest /bin/bash
- Create a new Python environment:
conda create -n mhim python=3.9 conda activate mhim
- Install the required packages.
A complete list of requirements can be found in requirements.txt.
pip install -r requirements.txt
We provide preprocessed patch features for all datasets. You can download them from: Hugginface, ModelScope, Baidu Netdisk (Password: ujtq)
If you have raw Whole-Slide Image (WSI) data, you can preprocess it as follows:
-
Patching (Following CLAM):
python CLAM/create_patches_fp.py --source YOUR_DATA_DIRECTORY \ --save_dir YOUR_RESULTS_DIRECTORY \ --patch_size 256 \ --step_size 256 \ --patch_level 0 \ --preset YOUR_PRESET_FILE \ --seg \ --patchReplace placeholders like
YOUR_DATA_DIRECTORYwith your actual paths and parameters. Preset files are officially provided by CLAM. -
Feature Extraction (Modify on the official CLAM repository to support the encoders of ResNet-50, CHIEF, UNI and GIGAP):
You can also extract all the required features following the process of TRIDENT.
CUDA_VISIBLE_DEVICES=$TARGET_GPUs python CLAM/extract_features_fp.py \ --data_h5_dir DIR_TO_COORDS \ --data_slide_dir DATA_DIRECTORY \ --csv_path CSV_FILE_NAME \ --feat_dir FEATURES_DIRECTORY \ --slide_ext .svs \ --model_name resnet50_trunc/uni_v1/chief/gigap
⚠️ Note: We've significantly refactored the codebase! If you spot any issues, please let us know. You can still use the old version in the v1 branch.
You can use wandb to track the training process, add --wandb to the command line.
For different patch encoders, you should use different input dimensions, use --input_dim to specify the input dimension (ResNet-50: 1024, PLIP: 512, UNI-v1: 1024, and so on).
model=mhim_pure baseline=[attn,selfattn,dsmil]
# baselines on Call dataset (diagnosis)
python3 main.py --project=$PROJECT_NAME --datasets=call --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_cls.yaml --title=call_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINE
# baselines on TCGA-NSCLC dataset (sub-type)
python3 main.py --project=$PROJECT_NAME --datasets=nsclc --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_cls.yaml --title=NSCLC_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINE
# baselines on TCGA-BRCA dataset (sub-type)
python3 main.py --project=$PROJECT_NAME --datasets=brca --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_cls.yaml --title=BRCA_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINE
# baselines on TCGA-BLCA dataset (survival)
python3 main.py --project=$PROJECT_NAME --datasets=surv_blca --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_surv.yaml --title=BLCA_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINE
# baselines on TCGA-LUAD dataset (survival)
python3 main.py --project=$PROJECT_NAME --datasets=surv_luad --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_surv.yaml --title=LUAD_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINE
# baselines on TCGA-LUSC dataset (survival)
python3 main.py --project=$PROJECT_NAME --datasets=surv_lusc --dataset_root=$DATASET_PATH --csv_path=$LABEL_CSV_PATH --output_path=$OUTPUT_PATH -c=./config/feat_surv.yaml --title=LUSC_$BASELINE --model=pure --model=mhim_pure --baseline=$BASELINEWe recommend performing a grid search for the following hyperparameters to achieve optimal performance:
mask_ratio_h: {0.01, 0.03, 0.05}merge_ratio: {0.8, 0.9}merge_k: {1, 5, 10}
# Grid search on hyperparameters
# Replace $DATASET_NAME with one of [call, nsclc, brca, surv_blca, surv_luad, surv_lusc]
# Replace $CONFIG_FILE with ./config/feat_cls.yaml (for diagnosis and sub-type) or ./config/feat_surv.yaml (for survival)
python3 main.py --project=$PROJECT_NAME \
--datasets=$DATASET_NAME \
--dataset_root=$DATASET_PATH \
--csv_path=$LABEL_CSV_PATH \
--output_path=$OUTPUT_PATH \
--teacher_init=$TEACHER_WEIGHT_PATH \
-c=$CONFIG_FILE \
--title=${DATASET_NAME}_mhim_${BASELINE}_${OTHER_HYPERPARAMETERS} \
--model=mhim \
--baseline=$BASELINE \
--attn2score \
--merge_enable \
--merge_mm=0.9999 \
--aux_alpha=0.5 \
--mask_ratio_h=$MASK_RATIO_H \
--merge_ratio=$MERGE_RATIO \
--merge_k=$MERGE_KIf you find this work useful in your research, please consider citing:
@InProceedings{Tang_2023_ICCV,
author = {Tang, Wenhao and Huang, Sheng and Zhang, Xiaoxian and Zhou, Fengtao and Zhang, Yi and Liu, Bo},
title = {Multiple Instance Learning Framework with Masked Hard Instance Mining for Whole Slide Image Classification},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {4078-4087}
}
@misc{tang2025multipleinstancelearningframework,
title={Multiple Instance Learning Framework with Masked Hard Instance Mining for Gigapixel Histopathology Image Analysis},
author={Wenhao Tang and Sheng Huang and Heng Fang and Fengtao Zhou and Bo Liu and Qingshan Liu},
year={2025},
eprint={2509.11526},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.11526},
}
