shixinlishixinli
diff --git a/‎README.md
+1 b/‎README.md
+1
diff --git a/‎tensorflow_toolkit/README.md
+1 b/‎tensorflow_toolkit/README.md
+1
diff --git a/‎tensorflow_toolkit/action_detection/README.md
+147 b/‎tensorflow_toolkit/action_detection/README.md
+147
diff --git a/‎tensorflow_toolkit/action_detection/README_CLASSIFIER.md
+50 b/‎tensorflow_toolkit/action_detection/README_CLASSIFIER.md
+50
diff --git a/‎tensorflow_toolkit/action_detection/README_DATA.md
+113 b/‎tensorflow_toolkit/action_detection/README_DATA.md
+113
@@ -40,6 +40,7 @@ inference.
 
 * [TensorFlow](tensorflow_toolkit)
 
+  * [Action Detection](tensorflow_toolkit/action_detection)
   * [License Plate Recognition](tensorflow_toolkit/lpr)
   * [Person Vehicle Bike Detector](tensorflow_toolkit/person_vehicle_bike_detector)
   * [SSD Object Detection](tensorflow_toolkit/ssd_detector)
 
@@ -39,6 +39,7 @@ inference.
 After installation, you are ready to train your own models, evaluate them, use
 them for predictions.
 
+* [Action Detection](action_detection)
 * [License Plate Recognition](lpr)
 * [Person Vehicle Bike Detector](person_vehicle_bike_detector)
 * [SSD MobileNet FPN 602](ssd_mobilenet_fpn_602)
 
@@ -0,0 +1,147 @@
+# Smart classroom scenario
+This repository contains TensorFlow code for deployment of person detection (PD) and action recognition (AR) models for smart classroom use-case. You can define own list of possible actions (see annotation file [format](./README_DATA.md) and steps for model training to change the list of actions) but this repository shows example for 6 action classes: standing, sitting, raising hand, writing, turned-around and lie-on-the-desk.
+
+## Pre-requisites
+- Ubuntu 16.04 / 18.04
+- Python 2.7
+
+## Installation
+ 1. Create virtual environment
+ ```bash
+ virtualenv venv -p python2 --prompt="(action)"
+ ```
+
+ 2. Activate virtual environment and setup OpenVINO variables
+ ```bash
+ . venv/bin/activate
+ . /opt/intel/openvino/bin/setupvars.sh
+ ```
+ **NOTE** Good practice is adding `. /opt/intel/openvino/bin/setupvars.sh` to the end of the `venv/bin/activate`.
+ ```
+ echo ". /opt/intel/openvino/bin/setupvars.sh" >> venv/bin/activate
+ ```
+
+ 3. Install modules
+ ```bash
+ pip2 install -e .
+ ```
+
+## Model training
+Proposed repository allows to carry out the full cycle model training procedure. There are two ways to get the high accurate model:
+ - Fine-tune from the proposed [initial weights](https://download.01.org/opencv/openvino_training_extensions/models/action_detection/person-detection-action-recognition-0006.tar.gz). This way is most simple and fast due to reducing training stages to single one - training PD&AR model directly.
+ - Full cycle model pre-training on classification and detection datasets and final PD&AR model training. To get most accurate model we recommend to pre-train model on the next tasks:
+   1. Classification on ImageNet dataset (see classifier training [instruction](./README_CLASSIFIER.md))
+   2. Detection on Pascal VOC0712 dataset (see detector training [instruction](./README_DETECTOR.md))
+   3. Detection on MS COCO dataset
+
+## Data preparation
+To prepare a dataset follow the [instruction](./README_DATA.md)
+
+## Action list definition
+Current repository is configured to work with 6-class action detection task but you can easily define own set of actions. After the [data preparation](#data-preparation) step you should have the configured class mapping file. Next we will use class `IDs` from there. Then change `configs/action/pedestriandb_twinnet_actionnet.yml` file according you set of actions:
+ 1. Field `ACTIONS_MAP` maps class `IDs` of input data into final set of actions. Note, that some kind of `undefined` class (if you have it) should be placed at he end of action list (to exclude it during training).
+ 2. Field `VALID_ACTION_NAMES` stores names of valid actions, which you want to recognize (excluding `undefined` action).
+ 4. If you have the `undefined` class set field `UNDEFINED_ACTION_ID` to `ID` of this class from `ACTIONS_MAP` map. Also add this `ID` to list: `IGNORE_CLASSES`.
+ 4. If you plan to use the demo mode (see [header](#action-detection-model-demostration)) change colors of the actions by setting fields: `ACTION_COLORS_MAP` and `UNDEFINED_ACTION_COLOR`.
+ 5. You can exclude some actions from the training procedure by including them into the list `IGNORE_CLASSES` but to achieve best performance it's recommended to label all boxes with persons even the target action is undefined for them (this boxes is still useful to train person detector model part).
+
+Bellow you can see the example of the valid field definition:
+```yaml
+"ACTIONS_MAP": {0:  0,   # sitting --> sitting
+                1:  3,   # standing --> standing
+                2:  2,   # raising_hand --> raising_hand
+                3:  0,   # listening --> sitting
+                4:  0,   # reading --> sitting
+                5:  1,   # writing --> writing
+                6:  5,   # lie_on_the_desk --> lie_on_the_desk
+                7:  0,   # busy --> sitting
+                8:  0,   # in_group_discussions --> sitting
+                9:  4,   # turned_around --> turned_around
+                10: 6}   # __undefined__ --> __undefined__
+"VALID_ACTION_NAMES": ["sitting", "writing", "raising_hand", "standing", "turned_around", "lie_on_the_desk"]
+"UNDEFINED_ACTION_NAME": "undefined"
+"UNDEFINED_ACTION_ID": 6
+"IGNORE_CLASSES": [6]
+"ACTION_COLORS_MAP": {0: [0, 255, 0],
+                      1: [255, 0, 255],
+                      2: [0, 0, 255],
+                      3: [255, 0, 0],
+                      4: [0, 153, 255],
+                      5: [153, 153, 255]}
+"UNDEFINED_ACTION_COLOR": [255, 255, 255]
+```
+
+## Person Detection and Action Recognition model training
+Assume we have a pre-trained model and want to fine-tune PD&AR model. In this case the the train procedure consists of next consistent stages:
+ 1. [Model training](#action-detection-model-training)
+ 2. [Model evaluation](#action-detection-model-evaluation)
+ 3. [Model demonstration](#action-detection-model-demonstration)
+ 4. [Graph optimization](#action-detection-model-optimization)
+ 5. [Export to IR format](#export-to-ir-format)
+
+
+### Action Detection model training
+If you want to fine-tune the model with custom set of actions you can use the provided init weights. To do this run the command:
+```Shell
+python2 tools/models/train.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
+                              -t <PATH_TO_DATA_FILE> \                               # file with train data paths
+                              -l <PATH_TO_LOG_DIR> \                                 # directory for logging
+                              -b 4 \                                                 # batch size
+                              -n 1 \                                                 # number of target GPU devices
+                              -i <PATH_TO_INIT_WEIGHTS> \                            # initialize model weights
+                              --src_scope "ActionNet/twinnet"                        # name of scope to load weights from
+```
+
+Note to continue model training (e.g. after stopping) from your snapshot you should run the same command but with key `-s <PATH_TO_SNAPSHOT>` and without specifying `--src_scope` key:
+```Shell
+python2 tools/models/train.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
+                              -t <PATH_TO_DATA_FILE> \                               # file with train data paths
+                              -l <PATH_TO_LOG_DIR> \                                 # directory for logging
+                              -b 4 \                                                 # batch size
+                              -n 1 \                                                 # number of target GPU devices
+                              -s <PATH_TO_SNAPSHOT> \                                # snapshot model weights
+```
+
+If you want to initialize the model from the weights differ than provided you should set the valid `--src_scope` key value:
+ - To initialize the model after pre-training on ImageNet classification dataset set `--src_scope "ImageNetModel/rmnet"`
+ - To initialize the model after pre-training on Pascal or COCO detection dataset set `--src_scope "SSD/rmnet"`
+
+### Action Detection model evaluation
+To evaluate the quality of the trained Action Detection model you should prepare the test data according [instruction](./README_DATA.md).
+
+```Shell
+python2 tools/models/eval.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
+                             -v <PATH_TO_DATA_FILE> \                               # file with test data paths
+                             -b 4 \                                                 # batch size
+                             -s <PATH_TO_SNAPSHOT> \                                # snapshot model weights
+```
+
+
+### Action Detection model demonstration
+
+```Shell
+python2 tools/models/demo.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
+                             -i <PATH_TO_VIDEO_FILE> \                              # file with video
+                             -s <PATH_TO_SNAPSHOT> \                                # snapshot model weights
+```
+
+Note to scale the output screen size you can specify the `--out_scale` key with desirable scale factor: `--out_scale 0.5`
+
+### Action Detection model optimization
+
+```Shell
+python2 tools/models/export.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
+                                 -s <PATH_TO_SNAPSHOT> \                                # snapshot model weights
+                                 -o <PATH_TO_OUTPUT_DIR> \                              # directory for the output model
+```
+
+Note that the frozen graph will be stored in: `<PATH_TO_OUTPUT_DIR>/frozen.pb`.
+
+### Export to IR format
+
+Run model optimizer for the trained Action Detection model (OpenVINO should be installed before):
+```Shell
+python mo_tf.py --input_model <PATH_TO_FROZEN_GRAPH> \
+                --output_dir <OUTPUT_DIR> \
+                --model_name SmartClassroomActionNet
+```
@@ -0,0 +1,50 @@
+# Image classification
+Current repository includes training and evaluation tools for image classification task. You can use one of the prepared config files to train the model on:
+ - ImageNet dataset (`configs/classification/imagenet_rmnet.yml`)
+
+## Data preparation
+Assume next structure of data:
+<pre>
+    |-- data_dir
+         |-- images
+            image_000000.png
+            image_000001.png
+         train_data.txt
+         test_data.txt
+</pre>
+Each data file (`train_data.txt` and `test_data.txt`) describes a data to train/eval model. Each row in data file represents a single source in next format: `path_to_image image_label`.
+
+
+## Model training
+To train object detection model from scratch run the command:
+```Shell
+python2 tools/models/train.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
+                              -t <PATH_TO_DATA_FILE> \                       # file with train data paths
+                              -l <PATH_TO_LOG_DIR> \                         # directory for logging
+                              -b 4 \                                         # batch size
+                              -n 1 \                                         # number of target GPU devices
+```
+
+**Note** If you want to initialize the model from the pre-trained model weights you should specify `-i` key as a path to init weights and set the valid `--src_scope` key value:
+ - To initialize the model after pre-training on any other classification dataset with `DATASET_NAME` name set `--src_scope "DATASET_NAME/rmnet"`
+
+Bellow the command to run the training procedure from the pre-trained model:
+```Shell
+python2 tools/models/train.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
+                              -t <PATH_TO_DATA_FILE> \                       # file with train data paths
+                              -l <PATH_TO_LOG_DIR> \                         # directory for logging
+                              -b 4 \                                         # batch size
+                              -n 1 \                                         # number of target GPU devices
+                              -i <PATH_TO_INIT_WEIGHTS> \                    # initialize model weights
+                              --src_scope "DATASET_NAME/rmnet"               # name of scope to load weights from
+```
+
+## Model evaluation
+To evaluate the quality of the trained Image Classification model you should prepare the test data according [instruction](#data-preparation).
+
+```Shell
+python2 tools/models/eval.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
+                             -v <PATH_TO_DATA_FILE> \                       # file with test data paths
+                             -b 4 \                                         # batch size
+                             -s <PATH_TO_SNAPSHOT> \                        # snapshot model weights
+```
@@ -0,0 +1,113 @@
+# Data preparation
+
+Assume next structure of data:
+<pre>
+    |-- data_dir
+         |-- images
+            |-- video_1
+                frame_000000.png
+                frame_000001.png
+            |-- video_2
+                frame_000000.png
+                frame_000001.png
+            |-- video_3
+                frame_000000.png
+                frame_000001.png
+         |-- annotation
+            annotation_file_1.xml
+            annotation_file_2.xml
+            annotation_file_3.xml
+         train_tasks.txt
+         test_tasks.txt
+</pre>
+Each annotation file (see [this](#annotation-file-format) header) describes a single source of images (see [this](#image-file-format) header).
+
+## Annotation file format
+For annotating it's better to use [CVAT](https://github.com/opencv/cvat) utility. So we assume that annotation file is stored in appropriate `.xml` [format](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md). In annotation file we have single independent track for each person on video which includes of bounding box description on each frame. General structure of annotation file:
+<pre>
+    |-- root
+         |-- track_0
+              bounding_box_0
+              bounding_box_1
+         |-- track_1
+              bounding_box_0
+              bounding_box_1
+</pre>
+
+Toy example of annotation file:
+```xml
+<?xml version="1.0" encoding="utf-8"?>
+<annotations count="1">
+    <track id="0" label="person">
+        <box frame="0" xtl="1.0" ytl="1.0" xbr="0.0" ybr="0.0" occluded="0">
+            <attribute name="action">action_name</attribute>
+        </box>
+    </track>
+</annotations>
+```
+where fields have next description:
+ - `count` - number of tracks
+ - `id` - unique ID of track in file
+ - `label` - label of track (data loader will skips all other labels except `person`)
+ - `frame` - unique ID of frame in track
+ - `xtl`, `ytl`, `xbr`, `ybr` - bounding box coordinates of top-left and bottom-right corners
+ - `occluded` - marker to highlight heavy occluded bounding boxes (can be skipped during training)
+ - `name` - name of bounding box attribute (data loader is sensitive for `action` class only)
+ - `action_name` - valid name of action (you can define own list of actions)
+
+## Image file format
+Our implementation of data loader works with independent images stored on the drive. Each image should be named in format `frame_xxxxxx.png` or `frame_xxxxxx.jpg` (where `xxxxxx` is unique image number).
+
+**NOTE** To extract images from the video you can use `tools/data/dump_frames.py`
+
+## Tasks file format
+For more robust control of image sources we have created separate file where each row represents a single source in next format: `annotation_file_path.xml image_height,image_width images_directory_path`. We assume that all images from the same source are resize to `image_height,image_width` sizes (it needs to properly decode annotations).
+
+Example of `train_tasks.txt` file:
+```
+annotations/annotation_file_1.xml 1920,1080 images/video1
+annotations/annotation_file_2.xml 1920,1080 images/video2
+```
+
+Example of `test_tasks.txt` file:
+```
+annotations/annotation_file_3.xml 1920,1080 images/video3
+```
+
+## Train/eval data file generation
+To generate the final data file (train or test) run the command:
+```Shell
+python2 tools/data/prepare_pedestrian_db.py -t <PATH_TO_TASKS> \      # path to file with tasks
+                                            -o <PATH_TO_OUTPUT_DIR> \ # output directory
+```
+
+The output directory structure (some example of script output you can find in `./dataset` folder):
+<pre>
+    |-- root
+         |-- annotation
+              |-- video_1
+                sample_000000.json
+                sample_000000.json
+              |-- video_2
+                sample_000000.json
+                sample_000000.json
+         data.txt
+         class_map.yml
+</pre>
+
+Generated files:
+ - `data.txt` file should be used as input for the train/eval scripts.
+ - `class_map.txt` file will include generate mapping from class names onto class IDs.
+
+**Note 1** To specify class IDs directly you can set `-i` key: `-i <PATH_TO_CLASS_MAP>` (see example `tools/data/pedestriandb_class_map.yml`). If you specify own class mapping than the `class_map.txt` file will not be generated.
+
+**Note 2** To generate valid class mapping for testing purpose you should set `-i <PATH_TO_CLASS_MAP>`, where `<PATH_TO_CLASS_MAP>` is generated by script `class_map.txt` file or your own class mapping file. Otherwise order of class IDs will be different.
+
+**Note 3** You can use prepared toy dataset (`./dataset` folder) to start you model training. You only need to specify the full path to images (`./dataset/images` folder) in `data.txt` file.
+
+## Config specification
+For the generated dataset you should set the correct field values in appropriate config file:
+ - `IMAGE_SIZE` - target image size in format `[height, width, num_channels]`
+ - `TRAIN_DATA_SIZE` - number training samples
+ - `VAL_DATA_SIZE` - number testing samples
+ - `MAX_NUM_DETECTIONS_PER_IMAGE` - max number of objects on single image (if it's more than subset of objects will be used)