Skip to content

Commit cb7fe72

Browse files
evgeny-izutovAlexanderDokuchaev
authored andcommitted
Person Detection and Action Recognition (openvinotoolkit#84)
* Added code for Action Detection model training * Updated path to initial model weights * Add feature to use relative paths * Rename optimize.py to export.py and update readme files * Change structure, add setup.py * Remove usless line in commands * Copyrights
1 parent e6082c8 commit cb7fe72

File tree

93 files changed

+9722
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+9722
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ inference.
4040

4141
* [TensorFlow](tensorflow_toolkit)
4242

43+
* [Action Detection](tensorflow_toolkit/action_detection)
4344
* [License Plate Recognition](tensorflow_toolkit/lpr)
4445
* [Person Vehicle Bike Detector](tensorflow_toolkit/person_vehicle_bike_detector)
4546
* [SSD Object Detection](tensorflow_toolkit/ssd_detector)

tensorflow_toolkit/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ inference.
3939
After installation, you are ready to train your own models, evaluate them, use
4040
them for predictions.
4141

42+
* [Action Detection](action_detection)
4243
* [License Plate Recognition](lpr)
4344
* [Person Vehicle Bike Detector](person_vehicle_bike_detector)
4445
* [SSD MobileNet FPN 602](ssd_mobilenet_fpn_602)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Smart classroom scenario
2+
This repository contains TensorFlow code for deployment of person detection (PD) and action recognition (AR) models for smart classroom use-case. You can define own list of possible actions (see annotation file [format](./README_DATA.md) and steps for model training to change the list of actions) but this repository shows example for 6 action classes: standing, sitting, raising hand, writing, turned-around and lie-on-the-desk.
3+
4+
## Pre-requisites
5+
- Ubuntu 16.04 / 18.04
6+
- Python 2.7
7+
8+
## Installation
9+
1. Create virtual environment
10+
```bash
11+
virtualenv venv -p python2 --prompt="(action)"
12+
```
13+
14+
2. Activate virtual environment and setup OpenVINO variables
15+
```bash
16+
. venv/bin/activate
17+
. /opt/intel/openvino/bin/setupvars.sh
18+
```
19+
**NOTE** Good practice is adding `. /opt/intel/openvino/bin/setupvars.sh` to the end of the `venv/bin/activate`.
20+
```
21+
echo ". /opt/intel/openvino/bin/setupvars.sh" >> venv/bin/activate
22+
```
23+
24+
3. Install modules
25+
```bash
26+
pip2 install -e .
27+
```
28+
29+
## Model training
30+
Proposed repository allows to carry out the full cycle model training procedure. There are two ways to get the high accurate model:
31+
- Fine-tune from the proposed [initial weights](https://download.01.org/opencv/openvino_training_extensions/models/action_detection/person-detection-action-recognition-0006.tar.gz). This way is most simple and fast due to reducing training stages to single one - training PD&AR model directly.
32+
- Full cycle model pre-training on classification and detection datasets and final PD&AR model training. To get most accurate model we recommend to pre-train model on the next tasks:
33+
1. Classification on ImageNet dataset (see classifier training [instruction](./README_CLASSIFIER.md))
34+
2. Detection on Pascal VOC0712 dataset (see detector training [instruction](./README_DETECTOR.md))
35+
3. Detection on MS COCO dataset
36+
37+
## Data preparation
38+
To prepare a dataset follow the [instruction](./README_DATA.md)
39+
40+
## Action list definition
41+
Current repository is configured to work with 6-class action detection task but you can easily define own set of actions. After the [data preparation](#data-preparation) step you should have the configured class mapping file. Next we will use class `IDs` from there. Then change `configs/action/pedestriandb_twinnet_actionnet.yml` file according you set of actions:
42+
1. Field `ACTIONS_MAP` maps class `IDs` of input data into final set of actions. Note, that some kind of `undefined` class (if you have it) should be placed at he end of action list (to exclude it during training).
43+
2. Field `VALID_ACTION_NAMES` stores names of valid actions, which you want to recognize (excluding `undefined` action).
44+
4. If you have the `undefined` class set field `UNDEFINED_ACTION_ID` to `ID` of this class from `ACTIONS_MAP` map. Also add this `ID` to list: `IGNORE_CLASSES`.
45+
4. If you plan to use the demo mode (see [header](#action-detection-model-demostration)) change colors of the actions by setting fields: `ACTION_COLORS_MAP` and `UNDEFINED_ACTION_COLOR`.
46+
5. You can exclude some actions from the training procedure by including them into the list `IGNORE_CLASSES` but to achieve best performance it's recommended to label all boxes with persons even the target action is undefined for them (this boxes is still useful to train person detector model part).
47+
48+
Bellow you can see the example of the valid field definition:
49+
```yaml
50+
"ACTIONS_MAP": {0: 0, # sitting --> sitting
51+
1: 3, # standing --> standing
52+
2: 2, # raising_hand --> raising_hand
53+
3: 0, # listening --> sitting
54+
4: 0, # reading --> sitting
55+
5: 1, # writing --> writing
56+
6: 5, # lie_on_the_desk --> lie_on_the_desk
57+
7: 0, # busy --> sitting
58+
8: 0, # in_group_discussions --> sitting
59+
9: 4, # turned_around --> turned_around
60+
10: 6} # __undefined__ --> __undefined__
61+
"VALID_ACTION_NAMES": ["sitting", "writing", "raising_hand", "standing", "turned_around", "lie_on_the_desk"]
62+
"UNDEFINED_ACTION_NAME": "undefined"
63+
"UNDEFINED_ACTION_ID": 6
64+
"IGNORE_CLASSES": [6]
65+
"ACTION_COLORS_MAP": {0: [0, 255, 0],
66+
1: [255, 0, 255],
67+
2: [0, 0, 255],
68+
3: [255, 0, 0],
69+
4: [0, 153, 255],
70+
5: [153, 153, 255]}
71+
"UNDEFINED_ACTION_COLOR": [255, 255, 255]
72+
```
73+
74+
## Person Detection and Action Recognition model training
75+
Assume we have a pre-trained model and want to fine-tune PD&AR model. In this case the the train procedure consists of next consistent stages:
76+
1. [Model training](#action-detection-model-training)
77+
2. [Model evaluation](#action-detection-model-evaluation)
78+
3. [Model demonstration](#action-detection-model-demonstration)
79+
4. [Graph optimization](#action-detection-model-optimization)
80+
5. [Export to IR format](#export-to-ir-format)
81+
82+
83+
### Action Detection model training
84+
If you want to fine-tune the model with custom set of actions you can use the provided init weights. To do this run the command:
85+
```Shell
86+
python2 tools/models/train.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
87+
-t <PATH_TO_DATA_FILE> \ # file with train data paths
88+
-l <PATH_TO_LOG_DIR> \ # directory for logging
89+
-b 4 \ # batch size
90+
-n 1 \ # number of target GPU devices
91+
-i <PATH_TO_INIT_WEIGHTS> \ # initialize model weights
92+
--src_scope "ActionNet/twinnet" # name of scope to load weights from
93+
```
94+
95+
Note to continue model training (e.g. after stopping) from your snapshot you should run the same command but with key `-s <PATH_TO_SNAPSHOT>` and without specifying `--src_scope` key:
96+
```Shell
97+
python2 tools/models/train.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
98+
-t <PATH_TO_DATA_FILE> \ # file with train data paths
99+
-l <PATH_TO_LOG_DIR> \ # directory for logging
100+
-b 4 \ # batch size
101+
-n 1 \ # number of target GPU devices
102+
-s <PATH_TO_SNAPSHOT> \ # snapshot model weights
103+
```
104+
105+
If you want to initialize the model from the weights differ than provided you should set the valid `--src_scope` key value:
106+
- To initialize the model after pre-training on ImageNet classification dataset set `--src_scope "ImageNetModel/rmnet"`
107+
- To initialize the model after pre-training on Pascal or COCO detection dataset set `--src_scope "SSD/rmnet"`
108+
109+
### Action Detection model evaluation
110+
To evaluate the quality of the trained Action Detection model you should prepare the test data according [instruction](./README_DATA.md).
111+
112+
```Shell
113+
python2 tools/models/eval.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
114+
-v <PATH_TO_DATA_FILE> \ # file with test data paths
115+
-b 4 \ # batch size
116+
-s <PATH_TO_SNAPSHOT> \ # snapshot model weights
117+
```
118+
119+
120+
### Action Detection model demonstration
121+
122+
```Shell
123+
python2 tools/models/demo.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
124+
-i <PATH_TO_VIDEO_FILE> \ # file with video
125+
-s <PATH_TO_SNAPSHOT> \ # snapshot model weights
126+
```
127+
128+
Note to scale the output screen size you can specify the `--out_scale` key with desirable scale factor: `--out_scale 0.5`
129+
130+
### Action Detection model optimization
131+
132+
```Shell
133+
python2 tools/models/export.py -c configs/action/pedestriandb_twinnet_actionnet.yml \ # path to config file
134+
-s <PATH_TO_SNAPSHOT> \ # snapshot model weights
135+
-o <PATH_TO_OUTPUT_DIR> \ # directory for the output model
136+
```
137+
138+
Note that the frozen graph will be stored in: `<PATH_TO_OUTPUT_DIR>/frozen.pb`.
139+
140+
### Export to IR format
141+
142+
Run model optimizer for the trained Action Detection model (OpenVINO should be installed before):
143+
```Shell
144+
python mo_tf.py --input_model <PATH_TO_FROZEN_GRAPH> \
145+
--output_dir <OUTPUT_DIR> \
146+
--model_name SmartClassroomActionNet
147+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Image classification
2+
Current repository includes training and evaluation tools for image classification task. You can use one of the prepared config files to train the model on:
3+
- ImageNet dataset (`configs/classification/imagenet_rmnet.yml`)
4+
5+
## Data preparation
6+
Assume next structure of data:
7+
<pre>
8+
|-- data_dir
9+
|-- images
10+
image_000000.png
11+
image_000001.png
12+
train_data.txt
13+
test_data.txt
14+
</pre>
15+
Each data file (`train_data.txt` and `test_data.txt`) describes a data to train/eval model. Each row in data file represents a single source in next format: `path_to_image image_label`.
16+
17+
18+
## Model training
19+
To train object detection model from scratch run the command:
20+
```Shell
21+
python2 tools/models/train.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
22+
-t <PATH_TO_DATA_FILE> \ # file with train data paths
23+
-l <PATH_TO_LOG_DIR> \ # directory for logging
24+
-b 4 \ # batch size
25+
-n 1 \ # number of target GPU devices
26+
```
27+
28+
**Note** If you want to initialize the model from the pre-trained model weights you should specify `-i` key as a path to init weights and set the valid `--src_scope` key value:
29+
- To initialize the model after pre-training on any other classification dataset with `DATASET_NAME` name set `--src_scope "DATASET_NAME/rmnet"`
30+
31+
Bellow the command to run the training procedure from the pre-trained model:
32+
```Shell
33+
python2 tools/models/train.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
34+
-t <PATH_TO_DATA_FILE> \ # file with train data paths
35+
-l <PATH_TO_LOG_DIR> \ # directory for logging
36+
-b 4 \ # batch size
37+
-n 1 \ # number of target GPU devices
38+
-i <PATH_TO_INIT_WEIGHTS> \ # initialize model weights
39+
--src_scope "DATASET_NAME/rmnet" # name of scope to load weights from
40+
```
41+
42+
## Model evaluation
43+
To evaluate the quality of the trained Image Classification model you should prepare the test data according [instruction](#data-preparation).
44+
45+
```Shell
46+
python2 tools/models/eval.py -c configs/classification/imagenet_rmnet.yml \ # path to config file
47+
-v <PATH_TO_DATA_FILE> \ # file with test data paths
48+
-b 4 \ # batch size
49+
-s <PATH_TO_SNAPSHOT> \ # snapshot model weights
50+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Data preparation
2+
3+
Assume next structure of data:
4+
<pre>
5+
|-- data_dir
6+
|-- images
7+
|-- video_1
8+
frame_000000.png
9+
frame_000001.png
10+
|-- video_2
11+
frame_000000.png
12+
frame_000001.png
13+
|-- video_3
14+
frame_000000.png
15+
frame_000001.png
16+
|-- annotation
17+
annotation_file_1.xml
18+
annotation_file_2.xml
19+
annotation_file_3.xml
20+
train_tasks.txt
21+
test_tasks.txt
22+
</pre>
23+
Each annotation file (see [this](#annotation-file-format) header) describes a single source of images (see [this](#image-file-format) header).
24+
25+
## Annotation file format
26+
For annotating it's better to use [CVAT](https://github.com/opencv/cvat) utility. So we assume that annotation file is stored in appropriate `.xml` [format](https://github.com/opencv/cvat/blob/develop/cvat/apps/documentation/xml_format.md). In annotation file we have single independent track for each person on video which includes of bounding box description on each frame. General structure of annotation file:
27+
<pre>
28+
|-- root
29+
|-- track_0
30+
bounding_box_0
31+
bounding_box_1
32+
|-- track_1
33+
bounding_box_0
34+
bounding_box_1
35+
</pre>
36+
37+
Toy example of annotation file:
38+
```xml
39+
<?xml version="1.0" encoding="utf-8"?>
40+
<annotations count="1">
41+
<track id="0" label="person">
42+
<box frame="0" xtl="1.0" ytl="1.0" xbr="0.0" ybr="0.0" occluded="0">
43+
<attribute name="action">action_name</attribute>
44+
</box>
45+
</track>
46+
</annotations>
47+
```
48+
where fields have next description:
49+
- `count` - number of tracks
50+
- `id` - unique ID of track in file
51+
- `label` - label of track (data loader will skips all other labels except `person`)
52+
- `frame` - unique ID of frame in track
53+
- `xtl`, `ytl`, `xbr`, `ybr` - bounding box coordinates of top-left and bottom-right corners
54+
- `occluded` - marker to highlight heavy occluded bounding boxes (can be skipped during training)
55+
- `name` - name of bounding box attribute (data loader is sensitive for `action` class only)
56+
- `action_name` - valid name of action (you can define own list of actions)
57+
58+
## Image file format
59+
Our implementation of data loader works with independent images stored on the drive. Each image should be named in format `frame_xxxxxx.png` or `frame_xxxxxx.jpg` (where `xxxxxx` is unique image number).
60+
61+
**NOTE** To extract images from the video you can use `tools/data/dump_frames.py`
62+
63+
## Tasks file format
64+
For more robust control of image sources we have created separate file where each row represents a single source in next format: `annotation_file_path.xml image_height,image_width images_directory_path`. We assume that all images from the same source are resize to `image_height,image_width` sizes (it needs to properly decode annotations).
65+
66+
Example of `train_tasks.txt` file:
67+
```
68+
annotations/annotation_file_1.xml 1920,1080 images/video1
69+
annotations/annotation_file_2.xml 1920,1080 images/video2
70+
```
71+
72+
Example of `test_tasks.txt` file:
73+
```
74+
annotations/annotation_file_3.xml 1920,1080 images/video3
75+
```
76+
77+
## Train/eval data file generation
78+
To generate the final data file (train or test) run the command:
79+
```Shell
80+
python2 tools/data/prepare_pedestrian_db.py -t <PATH_TO_TASKS> \ # path to file with tasks
81+
-o <PATH_TO_OUTPUT_DIR> \ # output directory
82+
```
83+
84+
The output directory structure (some example of script output you can find in `./dataset` folder):
85+
<pre>
86+
|-- root
87+
|-- annotation
88+
|-- video_1
89+
sample_000000.json
90+
sample_000000.json
91+
|-- video_2
92+
sample_000000.json
93+
sample_000000.json
94+
data.txt
95+
class_map.yml
96+
</pre>
97+
98+
Generated files:
99+
- `data.txt` file should be used as input for the train/eval scripts.
100+
- `class_map.txt` file will include generate mapping from class names onto class IDs.
101+
102+
**Note 1** To specify class IDs directly you can set `-i` key: `-i <PATH_TO_CLASS_MAP>` (see example `tools/data/pedestriandb_class_map.yml`). If you specify own class mapping than the `class_map.txt` file will not be generated.
103+
104+
**Note 2** To generate valid class mapping for testing purpose you should set `-i <PATH_TO_CLASS_MAP>`, where `<PATH_TO_CLASS_MAP>` is generated by script `class_map.txt` file or your own class mapping file. Otherwise order of class IDs will be different.
105+
106+
**Note 3** You can use prepared toy dataset (`./dataset` folder) to start you model training. You only need to specify the full path to images (`./dataset/images` folder) in `data.txt` file.
107+
108+
## Config specification
109+
For the generated dataset you should set the correct field values in appropriate config file:
110+
- `IMAGE_SIZE` - target image size in format `[height, width, num_channels]`
111+
- `TRAIN_DATA_SIZE` - number training samples
112+
- `VAL_DATA_SIZE` - number testing samples
113+
- `MAX_NUM_DETECTIONS_PER_IMAGE` - max number of objects on single image (if it's more than subset of objects will be used)

0 commit comments

Comments
 (0)