TensorFlow implementation of Text-guided Visual Feature Refinement for Text-Based Person Search accepted by ICMR 2021. The code is implemented based on the TensorFlow implementation of Deep Cross-Modal Projection Learning for Image-Text Matching
We propose a Text-guided visual feature refinement framework for text-based person search, which has two sub-networks, namely, Text-Based Filter Generation Module (TBFGM) and Text-Guided Visual Feature Refinement Module (TVFRM).
- TensorFlow 1.5.0
- CUDA 9.0 and cuDNN 7.6
- Python 2.7
-
Please download CUHK-PEDES
-
Convert the CUHK-PEDES image-text data into TFRecords.
cd builddata & sh scripts/format_and_convert_pedes.sh
- Please download Pretrained MobileNet_V1 Model for reimplementation or Best Result Model in our paper for testing.
- Please modify the
RESTORE_PATH
in the training script to your path where thePretrained MobilNet_V1 Model
is saved. - Please modify the
Save_NAME
in the training script to your path where theBest Result Model
is saved if you wanna to test using our best model.
-
Please Download Pretrained MobileNetV1 checkpoint
-
Train TVFR with MobileNet V1 + Bi-LSTM on CUHK-PEDES
sh scripts/train_pedes_mobilenet_cmpm_cmpc.sh
- Compute R@K(k=1,5,10) for text-to-image retrieval evaluation on CUHK-PEDES
sh scripts/test_pedes_mobilenet_cmpm_cmpc.sh
Zhang, et al. Deep Cross-Modal Projection Learning for Image-Text Matching, ECCV 2018.
If you find TVFR useful in your research, please kindly cite our paper:
@inproceedings{gao2021text,
title={Text-Guided Visual Feature Refinement for Text-Based Person Search},
author={Gao, Liying and Niu, Kai and Ma, Zehong and Jiao, Bingliang and Tan, Tonghao and Wang, Peng},
booktitle={Proceedings of the 2021 International Conference on Multimedia Retrieval},
pages={118--126},
year={2021}
}
If you have any questions, please feel free to contact [email protected]