This directory provides examples and best practices for building image similarity systems. Our goal is to enable the users to bring their own datasets and train a high-accuracy model easily and quickly. To this end, we provide example notebooks with pre-set default parameters shown to work well on a variety of datasets, and extensive documentation of common pitfalls, best practices, etc.
Image retrieval example showing the query image on the left, and the 6 images deemed most similar to its right:
The majority of state-of-the-art systems for image similarity use DNNs to compute a representation of an image (e.g. a vector of 512 floating point values). The similarity between two images is then defined as the cosine or the L2 distance between their respective DNN representations.
The main difference between recent image similarity publications is how the DNN is trained. A simple but surprisingly powerful approach is to use a standard image classification loss - this is the approach taken in the 01_training_and_evaluation_introduction.ipynb notebook, and explained in the classification folder. More accurate models are typically trained explicitly for image similarity using Triplet Learning such as the well-known FaceNet paper. While triplet-based approaches achieve good accuracies, they are conceptually complex, slower, and more difficult to train/converge due to issues such as how to mine good triplets.
Instead, the notebook 02_state_of_the_art.ipynb implements the BMVC 2019 paper "Classification is a Strong Baseline for Deep Metric Learning" which shows that this extra overhead is not necessary. Indeed, by making small changes to standard classification models, the authors achieve results which are comparable or better than the previous state-of-the-art on three common research datasets.
Below are a subset of popular papers in the field with reported accuracies on standard benchmark datasets:
Paper | Year | Uses triplet learning | Recall@1 CARS196 | Recall@1 CUB200-2011 | Recall@1 SOP |
---|---|---|---|---|---|
Deep Metric Learning via Lifted Structured Feature Embedding | CVPR 2016 | 49% | 47% | 62% | |
Deep Metric learning with angular loss | ICCV 2017 | Yes | 71% | 55% | 71% |
Sampling Matters in Deep Embedding Learning | ICCV 2017 | Yes | 80% | 64% | 73% |
No Fuss Distance Metric Learning using Proxies | ICCV 2017 | Yes | 73% | 49% | 74% |
Deep metric learning with hierarchical triplet loss | ECCV 2018 | Yes | 81% | 57% | 75% |
Classification is a Strong Baseline for DeepMetric Learning (Implemented in this repository) |
BMVC 2019 | No | 84% (512-dim) 89% (2048-dim) |
61% (512-dim) 65% (2048-dim) |
78% (512-dim) 80% (2048-dim) |
Answers to Frequently Asked Questions such as "How many images do I need to train a model?" or "How to annotate images?" can be found in the FAQ.md file. For image classification specified questions, see the FAQ.md in the classification folder.
We provide several notebooks to show how image similarity algorithms can be designed and evaluated.
Notebook name | Description |
---|---|
00_webcam.ipynb | Quick start notebook which demonstrates how to build an image retrieval system using a single image or webcam as input. |
01_training_and_evaluation_introduction.ipynb | Notebook which explains the basic concepts around model training and evaluation, based on using DNNs trained for image classification. |
02_state_of_the_art.ipynb | Implementation of the state-of-the-art BMVC 2019 paper mentioned in the table above. |
11_exploring_hyperparameters.ipynb | Finds optimal model parameters using grid search. |
12_fast_retrieval.ipynb | Fast image retrieval using nearest neighbor search. |
See the coding guidelines in the root folder.