
Video by Yan Krukau: https://www.pexels.com/video/male-teacher-with-his-students-8617126/
This project aims to perform gaze estimation using several deep learning models like ResNet, MobileNet v2, and MobileOne. It supports both classification and regression for predicting gaze direction. Built on top of L2CS-Net, the project includes additional pre-trained models and refined code for better performance and flexibility.
- ONNX Inference: Export pytorch weights to ONNX and ONNX runtime inference.
- ResNet: Deep Residual Networks - Enables deeper networks with better accuracy through residual learning.
- MobileNet v2: Inverted Residuals and Linear Bottlenecks - Efficient model for mobile applications, balancing performance and computational cost.
- MobileOne (s0-s4): An Improved One millisecond Mobile Backbone - Achieves near-instant inference times, ideal for real-time mobile applications.
- Face Detection: uniface - Uniface face detection library uses RetinaFace model.
Note
All models are trained only on Gaze360 dataset.
- Clone the repository:
git clone https://github.com/yakyo/gaze-estimation.git
cd gaze-estimation
- Install the required dependencies:
pip install -r requirements.txt
-
Download weight files:
a) Download weights from the following links:
Model PyTorch Weights ONNX Weights Size Epochs MAE ResNet-18 resnet18.pt resnet18_gaze.onnx 43 MB 200 12.84 ResNet-34 resnet34.pt resnet34_gaze.onnx 81.6 MB 200 11.33 ResNet-50 resnet50.pt resnet50_gaze.onnx 91.3 MB 200 11.34 MobileNet V2 mobilenetv2.pt mobilenetv2_gaze.onnx 9.59 MB 200 13.07 MobileOne S0 mobileone_s0_fused.pt mobileone_s0_gaze.onnx 4.8 MB 200 12.58 MobileOne S1 not available not available xx MB 200 * MobileOne S2 not available not available xx MB 200 * MobileOne S3 not available not available xx MB 200 * MobileOne S4 not availablet not available xx MB 200 * '*' - soon will be uploaded (due to limited computing resources I cannot publish rest of the weights, but you still can train them with given code).
b) Run the command below to download weights to the
weights
directory (Linux):sh download.sh [model_name] resnet18 resnet34 resnet50 mobilenetv2 mobileone_s0 mobileone_s1 mobileone_s2 mobileone_s3 mobileone_s4
Dataset folder structure:
data/
βββ Gaze360/
β βββ Image/
β βββ Label/
βββ MPIIFaceGaze/
βββ Image/
βββ Label/
Gaze360
- Link to download dataset: https://gaze360.csail.mit.edu/download.php
- Data pre-processing code: https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#gaze360
MPIIGaze
- Link to download dataset: download page
- Data pre-processing code: https://phi-ai.buaa.edu.cn/Gazehub/3D-dataset/#mpiifacegaze
python main.py --data [dataset_path] --dataset [dataset_name] --arch [architecture_name]
main.py
arguments:
usage: main.py [-h] [--data DATA] [--dataset DATASET] [--output OUTPUT] [--checkpoint CHECKPOINT] [--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE] [--arch ARCH] [--alpha ALPHA] [--lr LR] [--num-workers NUM_WORKERS]
Gaze estimation training.
options:
-h, --help show this help message and exit
--data DATA Directory path for gaze images.
--dataset DATASET Dataset name, available `gaze360`, `mpiigaze`.
--output OUTPUT Path of output models.
--checkpoint CHECKPOINT
Path to checkpoint for resuming training.
--num-epochs NUM_EPOCHS
Maximum number of training epochs.
--batch-size BATCH_SIZE
Batch size.
--arch ARCH Network architecture, currently available: resnet18/34/50, mobilenetv2, mobileone_s0-s4.
--alpha ALPHA Regression loss coefficient.
--lr LR Base learning rate.
--num-workers NUM_WORKERS
Number of workers for data loading.
python evaluate.py --data [dataset_path] --dataset [dataset_name] --weight [weight_path] --arch [architecture_name]
evaluate.py
arguments:
usage: evaluate.py [-h] [--data DATA] [--dataset DATASET] [--weights WEIGHTS] [--batch-size BATCH_SIZE] [--arch ARCH] [--num-workers NUM_WORKERS]
Gaze estimation evaluation.
options:
-h, --help show this help message and exit
--data DATA Directory path for gaze images.
--dataset DATASET Dataset name, available `gaze360`, `mpiigaze`
--weights WEIGHTS Path to model weight for evaluation.
--batch-size BATCH_SIZE
Batch size.
--arch ARCH Network architecture, currently available: resnet18/34/50, mobilenetv2, mobileone_s0-s4.
--num-workers NUM_WORKERS
Number of workers for data loading.
inference.py --model [model_name] --weight [model_weight_path] --view --source [source_video / cam_index] --output [output_file] --dataset [dataset_name]
detect.py
arguments:
usage: inference.py [-h] [--model MODEL] [--weight WEIGHT] [--view] [--source SOURCE] [--output OUTPUT] [--dataset DATASET]
Gaze estimation inference
options:
-h, --help show this help message and exit
--model MODEL Model name, default `resnet18`
--weight WEIGHT Path to gaze esimation model weights
--view Display the inference results
--source SOURCE Path to source video file or camera index
--output OUTPUT Path to save output file
--dataset DATASET Dataset name to get dataset related configs
Export to ONNX
python onnx_export.py --weight [model_path] --model [model_name] --dynamic
onnx_export.py
arguments:
usage: onnx_export.py [-h] [-w WEIGHT] [-n {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}] [-d {gaze360}] [--dynamic]
Gaze Estimation Model ONNX Export
options:
-h, --help show this help message and exit
-w WEIGHT, --weight WEIGHT
Trained state_dict file path to open
-n {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}, --model {resnet18,resnet34,resnet50,mobilenetv2,mobileone_s0}
Backbone network architecture to use
-d {gaze360,mpiigaze}, --dataset {gaze360,mpiigaze}
Dataset name for bin configuration
--dynamic Enable dynamic batch size and input dimensions for ONNX export
ONNX Inference
python onnx_inference.py --source [source video / webcam index] --model [onnx model path] --output [path to save video]
onnx_inference.py
arguments:
usage: onnx_inference.py [-h] --source SOURCE --model MODEL [--output OUTPUT]
Gaze Estimation ONNX Inference
options:
-h, --help show this help message and exit
--source SOURCE Video path or camera index (e.g., 0 for webcam)
--model MODEL Path to ONNX model
--output OUTPUT Path to save output video (optional)
If you use this work in your research, please cite it as:
Valikhujaev, Y. (2024). MobileGaze: Pre-trained mobile nets for Gaze-Estimation. Zenodo. https://doi.org/10.5281/zenodo.14257640
Alternatively, in BibTeX format:
@misc{valikhujaev2024mobilegaze,
author = {Valikhujaev, Y.},
title = {MobileGaze: Pre-trained mobile nets for Gaze-Estimation},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.14257640},
url = {https://doi.org/10.5281/zenodo.14257640}
}
- This project is built on top of L2CS-Net. Most of the code parts have been re-written for reproducibility and adaptability. Several additional backbones are provided with pre-trained weights.
- https://github.com/apple/ml-mobileone
- uniface - face detection library used for inference in
detect.py
.