kaggle-pku-autonomous-driving

Part of 5th place solution for Peking University/Baidu - Autonomous Driving on Kaggle (https://www.kaggle.com/c/pku-autonomous-driving).

Solution

My approach is based on CenterNet. I reimplemented CenterNet almost from scratch with reference to the author's implementation.

Heads

heatmap[1]
xy offset[2]
z (depth)[1]
pose[6]: cos(yaw), sin(yaw) cos(pitch), sin(pitch), cos(roll), sin(roll)
wh[2]: It's not used for prediction, but PublicLB was improved by learning this as an auxiliary task.

Heatmap's loss is Focal Loss, and the others are L1Loss. The weight of wh loss is 0.05. Mask regions of mask images are ignored when calculating loss.

Network Architecture

ResNet18 (pretrained ImageNet) + FPN (channels: 256->128->64)
DLA34 (pretrained KITTI 3DOP) + FPN (channels: 256->256->256)
Input size: 2560 x 2048 (2560 x 1024)
Output size: 640 x 512 (640 x 256)

Increasing the input size is very effective, mAP was improved dramatically. I tried deeper networks (ResNet34, 50) but not worked.

Augmentation

HFlip (p=0.5): Flip images horizontally and yaw *= -1, roll *= -1.
RandomShift (p=0.5, limit=0.1): Shift images and positions (x, y).
RandomScale (p=0.5, limit=0.1): Scale images and positions (x, y, z).
RandomHueSaturationValue (p=0.5, hue_limit=20)
RandomBrightness (p=0.5, limit=0.2)
RandomContrast (p=0.5, limit=0.2)

Training

Optimizer: RAdam
LR scheduler: CosineAnnealingLR (lr=1e-3 -> 1e-5)
50epochs
5-folds cv
Batch size: 4

Post Processing

Remove mask regions from predictions by multiplying heatmap by masks.
NMS (distance threshold: 0.1): I'm not sure how effective this is...
Find duplicate images with imagehash and ensemble them. PublicLB was slighly improved.
Score threshold: 0.3 (for val mAP: 0.1)

Ensemble

Ensemble each fold models and two models (ResNet18, DLA34) by averaging the raw output maps.

Score Summary

model	val mAP (tito's script)	PublicLB	PrivateLB
ResNet18 + FPN	0.257224305900695	0.118	0.109
DLA34 + FPN	0.2681900383192367	0.118	0.112
Ensemble	0.27363538075234406	0.121	0.115

What Didn't Work

Pseudo labeling
TTA (hflip)
Weight Standardization
Group Normalization
Deformable Convolution V2
Quaternion + L1Loss
Very large input size (3360 x 2688)
Eigen's depth prediction method used in CenterNet paper (z = 1 / sigmoid(output) − 1)

Run

python train.py --name resnet18_fpn --arch resnet18_fpn
python test.py --name resnet18_fpn
python train.py --name dla34_ddd_3dop --arch dla34_ddd_3dop --num_filters 256,256,256
python test.py --name dla34_ddd_3dop
python ensemble_test.py --models dla34_ddd_3dop,resnet18_fpn

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
inputs		inputs
lib		lib
models		models
outputs/submissions		outputs/submissions
pretrained_weights		pretrained_weights
processed		processed
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_image_hash.py		create_image_hash.py
create_pose_images.py		create_pose_images.py
ensemble_test.py		ensemble_test.py
ensemble_val.py		ensemble_val.py
eval.py		eval.py
pose_test.py		pose_test.py
pose_train.py		pose_train.py
pose_val.py		pose_val.py
test.py		test.py
train.py		train.py
val.py		val.py
visualize.py		visualize.py
wpf.py		wpf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kaggle-pku-autonomous-driving

Solution

Heads

Network Architecture

Augmentation

Training

Post Processing

Ensemble

Score Summary

What Didn't Work

Run

About

Releases 1

Packages

Languages

License

4uiiurz1/kaggle-pku-autonomous-driving

Folders and files

Latest commit

History

Repository files navigation

kaggle-pku-autonomous-driving

Solution

Heads

Network Architecture

Augmentation

Training

Post Processing

Ensemble

Score Summary

What Didn't Work

Run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages