This documentation outlines how to reproduce Detection part result of the 11th place solution by the "∫ℳΓϒℏ" team for the COVID19 Detection Competition on Kaggle hosted by SIIM-FISABIO-RSNA.
Code for reproducing Classification part result
Solution overview for Classification part
Below is the overview for Detection part solution
| Category | Public LB (1/6 mAP@.5) | Private LB (1/6 mAP@.5) |
|---|---|---|
| none | 0.134 | -- |
| opacity | 0.100 | -- |
Stratified K Fold by StudyID
none_probbility = np.prod(1 - box_conf_i)
Detectors trained with competition train data only
| backbone | image size | batch size | epochs | TTA | iou | conf | CV opacity (mAP@.5) | CV none (mAP@.5) |
|---|---|---|---|---|---|---|---|---|
| VFNetr50 | 640 | 8 | 35 | Y | 0.5 | 0.001 | 0.48358 | 0.23121 |
| Yolov5m* | 1024 | 8 | 35 | Y | 0.5 | 0.001 | 0.48148 | 0.76216 |
| Yolov5x* | 640 | 8 | 35 | Y | 0.5 | 0.001 | 0.50930 | -- |
| Yolov5x | 512 | 8 | 35 | Y | 0.5 | 0.001 | 0.51690 | 0.78192 |
| Yolov5l6 | 512 | 8 | 35 | Y | 0.5 | 0.001 | 0.51650 | 0.78190 |
| Yolov5x6 | 512 | 8 | 35 | Y | 0.5 | 0.001 | 0.51754 | 0.77820 |
| YoloTrs | 512 | 32 | 40 | Y | 0.5 | 0.001 | 0.51343 | 0.776458 |
*: trained with different hyperparameter config
Detectors trained with pseudo data
| backbone | image size | batch size | epochs | TTA | iou | conf | CV opacity (mAP@.5) | CV none (mAP@.5) |
|---|---|---|---|---|---|---|---|---|
| Yolov5x | 512 | 8 | 50 | Y | 0.5 | 0.001 | 0.53870 | 0.79028 |
Datasets: Public test set + BIMCV + RICORD
- For BIMCV, the dataset contains a lot of images which are taken for the left/right side of the human body. In order to reduce noise, we manually removed them from the dataset. And since both training and test data in this competition are drawn from this dataset, to avoid leakage in validation, we removed all of the duplicate images and images that have the same StudyID with these duplicates.
Making pseudo labels
- Label images with
none_probability > 0.6as none class images - For those have
none_probability <= 0.6, keep boxes withconfident_score > 0.095These thresholds are chosen in order to maximize the f1 score.
Training All datasets are merged together and used to train with the same procedure as without pseudo data.
- Weighted boxes fusion with
iou_thr=0.6andconf_thr=0.0001as boxes fusion method box_conf = box_conf**0.84 * (1 - none_probability)**0.16none_probability = none_probability*0.5 + negative_probability*0.5negative_probability = none_probability*0.3 + negative_probability*0.7
For final submission, we used Yolotrs-384 + Yolov5x-640 + Yolov5x-512-pseudo labels, all with TTA.
- Ubuntu 20.04.01 LTS
- Python 3.8
- python packages are detailed separately in requirements
$ virtualenv --python=python3.8 envs
$ source envs/bin/activate
$ pip install -r requirements.txt
All required datasets will be automatically downloaded via command
$ ./download_datasets.sh
The downloaded datasets will be placed in directory ./dataset, including:
fold-split-siimfold split data for the train dataset using Stratified K Fold1024x1024-png-siimcompetition train datasetmetadatasets-siimmetadata for the train datasetimage-level-psuedo-label-metadata-siimmetadata for the pseudo label datasetsricord-covid19-xray-positive-testsRICORD COVID-19 X-ray positive tests datasetcovid19-posi-dump-siimBIMCV COVID-19 datasetyolotr-pretrainedYoloTRs pretrained checkpointmmdet-vfnet-pretrainedVFNetr50 & VFNetr101 pretrained checkpoints
Navigate your working directory into ./src
$ cd ./src
Train detectors including yolov5s, yolov5m, yolov5l, yolov5x, yolov5s6, yolov5m6, yolov5l6, yolov5x6, yolotrs
# Train a Yolo-Transformer-s for 3 epochs on folds 0 and 1
$ python ./detection/yolo/train.py --weight yolotrs --epochs 3 --folds 0,1 --img 640 --batch 16
To train on both train data and pseudo-labeled data, add flag --pseudo path/to/hard/label/csv to the end of the above command.
Checkpoints for best epochs will be saved at ./result/yolo/checkpoints
$ python ./detection/yolo/infer.py \
$ -ck ../result/yolo/checkpoints/best0.pt \ # paths to model checkpoints
$ ../result/yolo/checkpoints/best1.pt \
$ --iou 0.5 \ # box fusion iou threshold
$ --conf 0.0001 \ # box fusion skip box threshold
$ --mode remote \ # 'local' mode for evaluating on validation dataset,
'remote' mode for predicting on test dataseti,
'pseudo' mode for predicting on external datasets
$ --image 614 \
$ --batch 32
Output .csv files will be saved at ./result/yolo/submit
Train detectors including vfnetr50, vfnetr101
# Train a VFNetr50 for 3 epochs on folds 0 and 1
$ python ./detection/mmdet/train.py --weight vfnetr50 --epochs 3 --folds 0,1
Checkpoints for best epochs will be saved at ./result/mmdet/checkpoints
$ python ./detection/yolo/infer.py \
$ -ck ../result/mmdet/checkpoints/best0.pt \ # paths to model checkpoints
$ ../result/mmdet/checkpoints/best1.pt \
$ --iou 0.5 \ # box fusion iou threshold
$ --conf 0.0001 \ # box fusion skip box threshold
$ --mode remote \ # 'local' mode for evaluating on validation dataset,
'remote' mode for predicting on test dataset
Output .csv files will be saved at ./result/mmdet/submit
$ python ./detection/yolo/infer.py \
$ -ck ../result/mmdet/checkpoints/best0.pt \ # paths to model checkpoints
$ ../result/mmdet/checkpoints/best1.pt \
$ --iou 0.5 \ # box fusion iou threshold
$ --conf 0.0001 \ # box fusion skip box threshold
$ --mode pseudo \ # 'local' mode for evaluating on validation dataset,
'remote' mode for predicting on test dataset,
'pseudo' mode for predicting on external datasets
$ --image 614 \
$ --batch 32
Output .csv files will be saved at ./result/pseudo/prediction/
$ python ./detection/make_pseudo.py \
$ -paths ../result/best0.csv \ # paths to predicted csv files to ensemble
$ ../result/best1.csv \
$ -ws 2 1 \ # ensemble weights in same order as -paths
$ --iou 0.6 \ # box fusion iou threshold
$ --conf 0.001 \ # box fusion skip box threshold
$ --none 0.6 \ # threshold for hard-labeling images as none-class
$ --opacity 0.095 # threshold for hard-labeling images as opacity-class
Output .csv files will be saved at ./result/pseudo/hard_label/
Final submission file will be named submission.csv and saved at ./result/submission.
$ python ./post_processing/postprocess.py \
$ -study ../result/submit/study/best0.csv \ # paths to study-level csv files
$ ../result/submit/study/best1.csv \
$ -image ../result/submit/image/best0.csv \ # paths to image-level csv files
$ ../result/submit/image/best1.csv \
$ -sw 1 2 \ # study-level ensemble weights in same order as -study
$ -iw 1 1 \ # image-level ensemble weights in same order as -image
$ --iou 0.6 \ # box fusion iou threshold
$ --conf 0.001 # box fusion skip box threshold
Pytorch
Albumentations
YoloV5
YoloTR
MMDetection
Weighted Boxes Fusion