diff --git a/docs/_static/smot_multi_demo.gif b/docs/_static/smot_multi_demo.gif
new file mode 100644
index 0000000000..121d313ac1
Binary files /dev/null and b/docs/_static/smot_multi_demo.gif differ
diff --git a/docs/tutorials/tracking/demo_smot.py b/docs/tutorials/tracking/demo_smot.py
index 71a1e665ff..0d3f6b6bef 100644
--- a/docs/tutorials/tracking/demo_smot.py
+++ b/docs/tutorials/tracking/demo_smot.py
@@ -26,7 +26,7 @@
 #
 # ::
 #
-#     python demo.py MOT17-02-FRCNN-raw.webm
+#     python demo.py MOT17-02-FRCNN-raw.webm --network-name ssd_512_mobilenet1.0_coco --use-pretrained --custom-classes person --use-motion
 #
 #
 ################################################################
@@ -43,4 +43,32 @@
 
 ################################################################
 # Our model is able to track multiple persons even when they are partially occluded.
-# Try it on your own video and see the results!
+# If you want to track multiple object categories at the same time,
+# you can simply pass in the extra class names.
+#
+# For example, let's download a video from MOT challenge website,
+
+from gluoncv import utils
+video_path = 'https://motchallenge.net/sequenceVideos/MOT17-13-FRCNN-raw.webm'
+im_video = utils.download(video_path)
+
+################################################################
+# Then you can simply use our provided script under `/scripts/tracking/smot/demo.py` to obtain the multi-object tracking result.
+#
+# ::
+#
+#     python demo.py MOT17-13-FRCNN-raw.webm --network-name ssd_512_resnet50_v1_coco --use-pretrained --custom-classes person car --detect-thresh 0.7 --use-motion
+#
+#
+# Now we are tracking both person and cars,
+#
+# .. raw:: html
+#
+#     <div align="center">
+#         <img src="../../_static/smot_multi_demo.gif">
+#     </div>
+#
+#     <br>
+
+################################################################
+# Try SMOT on your own video and see the results!
diff --git a/scripts/tracking/smot/README.md b/scripts/tracking/smot/README.md
index 0d110a0484..c749cdfe30 100644
--- a/scripts/tracking/smot/README.md
+++ b/scripts/tracking/smot/README.md
@@ -1,11 +1,13 @@
 # [SMOT](https://arxiv.org/abs/2010.16031)
-SMOT: Single-Shot Multi Object Tracking
+
+**SMOT: Single-Shot Multi Object Tracking**
+
 by Wei Li, Yuanjun Xiong, Shuo Yang, Siqi Deng and Wei Xia
 
 
 ## Introduction
 
-In this we release code and models from the paper [Single-Shot Multi Object Tracking (SMOT)](https://arxiv.org/abs/2010.16031), to perform multi-object tracking. SMOT is a new tracking framework that converts any single-shot detector (SSD) model into an online multiple object tracker, no training is required. You can track any object as long as the detector can recognize the category. We also want to point out that, SMOT is very efficient, its runtime is close to the runtime of the chosen detector.
+In this, we release code and models from the paper [Single-Shot Multi Object Tracking (SMOT)](https://arxiv.org/abs/2010.16031), to perform multi-object tracking. SMOT is a new tracking framework that converts any single-shot detector (SSD) model into an online multiple object tracker. **You can track any object as long as the detector can recognize the category, no training is required.** We also want to point out that, SMOT is very efficient, its runtime is close to the runtime of the chosen detector.
 
 
 ## Installation
@@ -15,13 +17,23 @@ For installation, follow the instruction of GluonCV. Optionally, if you would li
 
 ## Demo on a video
 
-Want to get MOT results on a video? Try this
+Want to get multi-person tracking results on a video? Try this
 
 ```
-python demo.py VIDEOFILE
+python demo.py VIDEOFILE --input-type video --network-name ssd_512_mobilenet1.0_coco --use-pretrained --custom-classes person
 ```
 
-`VIDEOFILE` is the path to your demo video. The visualization results will be saved to `./smot_vis`, but you can change the directory by specifing `--save-path`.
+- `VIDEOFILE` is the path to your demo video. It can also be the path to a folder of a sequence of images, but in this case, please remember to change the following `--input-type` to `images`.
+
+- `--input-type` indicates the input type. It can be either `video` or `images`.
+
+- `--network-name` is the detector you want to use. We use `ssd_512_mobilenet1.0_coco` here due to its good tradeoff between tracking accuracy and efficiency. You can find other detetors in our model zoo that suits your use case.
+
+- `--use-pretrained` indicates you want to use the pretrained weights from GluonCV model zoo. If you don't specify `--use-pretrained`, please use `--param-path` to pass in your detector pretrained weights.
+
+- `--custom-classes` indicates which object category you want to track. You can also track multiple categories at the same time, e.g., `--custom-classes person car dog`.
+
+- The tracking visualization results will be saved to `./smot_vis`, but you can change the directory by specifing `--save-path`.
 
 
 ## Eval on exisiting datasets
@@ -42,10 +54,10 @@ The transformed data in json format will be saved to `./jsons` and its npy forma
 Third, generate predictions using SMOT,
 
 ```
-python demo.py MOT17_DATA/MOT17-02-FRCNN/img1/ --input-type images --use-motion --save-path ./pred
+python demo.py MOT17_DATA/MOT17-02-FRCNN/img1/ --input-type images --network-name ssd_512_mobilenet1.0_coco --use-pretrained --custom-classes person --use-motion --save-path ./pred
 ```
 
-Since MOT data is stored in image sequences, we need to change `input-type` to `images`. `use-motion` can be used optionally to improve the tracking performance. The predictions will be saved in npy format to path `./pred`.
+Since MOT data is stored in image sequences, we need to change `input-type` to `images`. `use-motion` can be used optionally to improve the tracking performance. The predictions will be saved in npy format to the path `./pred`.
 
 
 Finally, run evaluation on the predictions and ground-truth, the results will be printed to console in a table.