1st-place-solution of SPR Screening Mammography Recall

Differentiate between negative (BI-RADS 1,2) and positive (BI-RADS 0,3,4,5) recall mammograms.

Our pipeline

Our competition rank

Overview

List of contents

Data Processing Steps

Model Training

Prepare the test submission

Evaluation Results

TODO

License

Acknowledgements

Data Processing

Step 1: Data Download and Unzip

Download the data from the Kaggle competition page.
Unzip the data into a directory.

cd /Your/Data/Path
for i in {1..9}; do
    curl -L -o ./Kaggle_SPR_Screening_Mammography/spr-mmg-$i.zip \
        https://www.kaggle.com/api/v1/datasets/download/felipekitamura/spr-mmg-$i
        
    unzip ./Kaggle_SPR_Screening_Mammographyspr-mmg-$i.zip -d ./Kaggle_SPR_Screening_Mammography/dicoms/
done

curl -L -o ./Kaggle_SPR_Screening_Mammography/spr-mmg-02.zip \
        https://www.kaggle.com/api/v1/datasets/download/felipekitamura/spr-mmg-02
        
unzip ./Kaggle_SPR_Screening_Mammographyspr-mmg-02.zip -d ./Kaggle_SPR_Screening_Mammography/dicoms/

Step 2: Data Preprocessing

The data is in DICOM format. I will use the pydicom library to read the DICOM files and convert them to PNG format.

cd /Your/Codebase/Path/Kaggle_SPR
# Two kinds of processed data are provided: raw PNG and cropped+resized PNG into the "raw_png" folder and the "processed_png" folder.
# The raw PNG is converted from the original DICOM files directly.
# The cropped+resized PNG is converted from the original DICOM files after cropping the breast area and resizing to 2048 \times 1024. 

python ./Image_preprocess/data_preprocess.py \
    --src_folder /Your/Data/Path/Kaggle_SPR_Screening_Mammography/dicoms/ \
    --dest_folder /Your/Data/Path/Kaggle_SPR_Screening_Mammography/pngs/ \

Step 3: Metadata collection from DICOM tags

The metadata is collected from the DICOM tags.

python ./Image_preprocess/dcmtags_collection.py \
    --src_folder /Your/Data/Path/Kaggle_SPR_Screening_Mammography/dicoms/ \
    --dest_folder /Your/Data/Path/Kaggle_SPR_Screening_Mammography/pngs/ \

Step 4: Data Splitting

The data is split into training and validation sets. The split is done on the patient level.

python ./Image_preprocess/cv_split.py

Model Training

The model is trained using the training set and validated using the validation set. The model is trained using the train.py script.

EfficientNet-B2 & B5 Backbone with public weights（Mammo-CLIP)

cd /Your/Codebase/Path/Kaggle_SPR/mammo_cls

BS=12
ImgSize=1536 # 256 512 1024 1536 2048
for arch in efficientnet_b2 efficientnet_b5
do
  base_model_path="/Your/Codebase/Path/Mammo-CLIP/weights/$arch"
  csv_dir="/Your/Codebase/Path/Kaggle_SPR/data_csv/cv_split"  # change to your own directory
  results_dir="/Your/Codebase/Path/Kaggle_SPR/finetune_${ImgSize}/"  # change to your own directory
  image_dir="/Your/Image/Path"  # change to your own directory
  dataset_config="/Your/Codebase/Path/Kaggle_SPR/configs/datasets/datasets.yaml" # change to your own dataset config

  pretrained_model_path="${base_model_path}.tar"

  for fold in 0 1 2 3
  do

    echo "${arch} fold--$fold"
    python train.py \
    --seed 42 \
    --fold $fold \
    --num-output-neurons 1 \
    --pretrained_model_path $pretrained_model_path \
    --model_method Mammo_Clip \
    --dataset spr \
    --dataset-config $dataset_config \
    --csv-dir $csv_dir \
    --image-dir $image_dir \
    --accumulation_steps 32 \
    --batch-size $BS \
    --img-size $ImgSize \
    --results-dir $results_dir \
    
  done
done

ResNet-18 Backbone with public weights（MIRAI）

cd /Your/Codebase/Path/Kaggle_SPR/mammo_cls

BS=12
ImgSize=2048 # 256 512 1024 1536 2048
for arch in resnet18
do
  base_model_path="/Your/Codebase/Path/Mammo-CLIP/weights/$arch"
  csv_dir="/Your/Codebase/Path/Kaggle_SPR/data_csv/cv_split"  # change to your own directory
  results_dir="/Your/Codebase/Path/Kaggle_SPR/finetune_${ImgSize}/"  # change to your own directory
  image_dir="/Your/Image/Path"  # change to your own directory
  dataset_config="/Your/Codebase/Path/Kaggle_SPR/configs/datasets/datasets.yaml" # change to your own dataset config

  pretrained_model_path="${base_model_path}.tar"

  for fold in 0 1 2 3
  do

    echo "${arch} fold--$fold"
    python train.py \
    --seed 42 \
    --fold $fold \
    --num-output-neurons 1 \
    --pretrained_model_path $pretrained_model_path \
    --model_method MIRAI \
    --dataset spr \
    --dataset-config $dataset_config \
    --csv-dir $csv_dir \
    --image-dir $image_dir \
    --accumulation_steps 32 \
    --batch-size $BS \
    --img-size $ImgSize \
    --results-dir $results_dir \
    
  done
done

ConvNeXt-Small Backbone with public weights（RSNA-2023-Mammo[1st]）

cd /Your/Codebase/Path/Kaggle_SPR/mammo_cls

BS=12
ImgSize=2048 # 256 512 1024 1536 2048
for arch in resnet18
do
  base_model_path="/Your/Codebase/Path/Mammo-CLIP/weights/$arch"
  csv_dir="/Your/Codebase/Path/Kaggle_SPR/data_csv/cv_split"  # change to your own directory
  results_dir="/Your/Codebase/Path/Kaggle_SPR/finetune_${ImgSize}/"  # change to your own directory
  image_dir="/Your/Image/Path"  # change to your own directory
  dataset_config="/Your/Codebase/Path/Kaggle_SPR/configs/datasets/datasets.yaml" # change to your own dataset config

  pretrained_model_path="${base_model_path}.tar"

  for fold in 0 1 2 3
  do

    echo "${arch} fold--$fold"
    python train.py \
    --seed 42 \
    --fold $fold \
    --num-output-neurons 1 \
    --pretrained_model_path $pretrained_model_path \
    --model_method MIRAI \
    --dataset spr \
    --dataset-config $dataset_config \
    --csv-dir $csv_dir \
    --image-dir $image_dir \
    --accumulation_steps 32 \
    --batch-size $BS \
    --img-size $ImgSize \
    --results-dir $results_dir \
    
  done
done

Prepare the test submission

Ready for Submission CSV

python ./result_analysis/result_analysis_test.py

Evaluation Results

What works

Ensemble model. Make sure each fold-based model is involved in the ensemble model, even if no models in the same fold are good (not very sure).
Large size of the image sometimes works better.
Advancement of the backbone model. For example, EfficientNet-B2/B5 and ConvNext-Small are better than ResNet18. May be due to the more parameters and optimized architectural design.
I changed the averaging probs to maximum prob when calculating the patient-level scores from breast-level scores, AUC ups from 0.783 to 0.793!

What doesn't work

Aux-task learning is not working well. The model may be overfitting on the CV.
More external training datasets are not working well. Maybe I failed to set the label correctly.

Click to expand for details

Backbone with public weights	Img-Size	Training Dataset	Fold 0	Fold 1	Fold 2	Fold 3	Public LB	Private LB
*Mammo-CLIP pretrained method*
[1] EfficientNet-B2 (Mammo-CLIP)	`1536×768`	SPR	0.785	0.766	0.781	0.769	0.772	-
[2] EfficientNet-B5 (Mammo-CLIP)	`1536×768`	SPR	0.776	0.774	0.780	0.781	0.773	-
Ensemble model [1, 2]	`1536×768`	SPR	-	-	-	-	0.775	-
*MIRAI pretrained method*
[3] ResNet18 (MIRAI)	`1536×768`	SPR	0.773	0.768	0.775	0.762	0.762	-
Ensemble model [1, 2, 3]	`1536×768`	SPR	-	-	-	-	0.777	-
RSNA2023Mammo pretrained method
[4] ConvNext-Small (RSNA2023Mammo)	`1536×768`	SPR	0.784	0.771	0.770	0.770	0.771	-
Ensemble model [1, 2, 3, 4]	`1536×768`	SPR	-	-	-	-	0.780	_
*Aux-task method*
~~[5] ResNet18 (MIRAI) Aux-task [age]~~	`1536×768`	SPR	0.775	0.764	0.774	0.770	TODO	-
~~Ensemble model [1, 2, 3, 4, 5]~~	`1536×768`	SPR	-	-	-	-	~~0.776~~	-
~~Ensemble model [1, 2, 4, 5]~~	`1536×768`	SPR	-	-	-	-	~~0.778~~	-
~~[6] EfficientNet-B2 (Mammo-CLIP) Aux-task [age]~~	`1536×768`	SPR	0.785	0.771	0.773	0.778	0.766	-
~~[7] EfficientNet-B5 (Mammo-CLIP) Aux-task [age]~~	`1536×768`	SPR	0.785	-	-	-	TODO	-
*Large size of image*
[8] ConvNext-Small (RSNA2023Mammo)	`2048×1024`	SPR	0.791	0.771	0.784	0.782	0.782`(0)`	-
~~Ensemble model [1, 2, 3, 4, 6`(012)`, 7, 8]~~	`Mixed`	SPR	-	-	-	-	~~0.779~~	-
~~Ensemble model [1, 2, 3, 4, 6`(01)`, 7, 8]~~	`Mixed`	SPR	-	-	-	-	~~0.779~~	-
Ensemble model [1, 2, 3, 4, 8]	`Mixed`	SPR	-	-	-	-	0.782	-
Ensemble model [1, 2, 3, 4, 8] [max breast score]	`Mixed`	SPR	-	-	-	-	0.789	_
Ensemble all (> 0.78) in model [1`(02)`, 2`(23)`, 4`(0)`, 8`(0)`]	`Mixed`	SPR	-	-	-	-	0.783	_
Ensemble all (> 0.78) in model [1`(02)`, 2`(23)`, 4`(0)`, 8`(0)`] [max breast score]	`Mixed`	SPR	-	-	-	-	0.787	_
~~Ensemble all (> 0.785) in model [1`(0)`, 8`(0)`]~~	`Mixed`	SPR	-	-	-	-	~~0.770~~	-
Ensemble top1 model in each fold [1`(2)`, 2`(13)`, 8`(0)`]	`Mixed`	SPR	-	-	-	-	0.783	_
Ensemble top1 model in each fold [1`(2)`, 2`(13)`, 8`(0)`] [max breast score]	`Mixed`	SPR	-	-	-	-	0.793	_
Ensemble top2 models in each fold [1`(02)`, 2`(123)`, 4`(3)`, 8`(01)`]	`Mixed`	SPR	-	-	-	-	0.783	-
Ensemble top2 models in each fold [1`(02)`, 2`(123)`, 4`(3)`, 8`(01)`] [max breast score]	`Mixed`	SPR	-	-	-	-	0.790	_
Ensemble top1 models in each fold [1`(2)`, 2`(1)`, 8`(03)`] [max breast score]	`Mixed`	SPR	-	-	-	-	0.795	0.838
*More external training dataset*
[3a] ResNet18 (MIRAI)	`1536×768`	SPR Vindr	0.776	-	-	-	-	-
~~[3b] ResNet18 (MIRAI)~~	`1536×768`	SPR CSAW	0.757	-	-	-	-	-
~~[3c] ResNet18 (MIRAI)~~	`1536×768`	SPR EMBED	0.715	-	-	-	-	-
~~[3d] ResNet18 (MIRAI)~~	`1536×768`	SPR Vindr RSNA	0.752	-	-	-	-	-
~~[3e] EfficientNet-B2 (Mammo-CLIP)~~	`1536×768`	SPR Vindr RSNA EMBED CSAW	0.769	-	-	-	-	-

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Image_preprocess		Image_preprocess
assets		assets
configs/datasets		configs/datasets
data_csv		data_csv
mammo_cls		mammo_cls
result_analysis		result_analysis
third_party		third_party
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Solution.md		Solution.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1st-place-solution of SPR Screening Mammography Recall

Our pipeline

Our competition rank

Overview

List of contents

Data Processing

Model Training

Prepare the test submission

Evaluation Results

What works

What doesn't work

TODO

License

Acknowledgements

About

Releases

Packages

Languages

License

xinwangxinwang/Kaggle_SPR

Folders and files

Latest commit

History

Repository files navigation

1st-place-solution of SPR Screening Mammography Recall

Our pipeline

Our competition rank

Overview

List of contents

Data Processing

Model Training

Prepare the test submission

Evaluation Results

What works

What doesn't work

TODO

License

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages