Eye Traking System

In this repo we will create a CNN to detect where the user eye are looking in the screen using the computer webcam

Computer Desktop dimention 1920x1080

Repo Structure

data_collection_phase: Script to collect data
clean_data: Procedure to extract only good tripes of face, left eye and right eye
only_face_model: CNN trained to detect where you are looking on the screen based on face corped image
triple_eyeface_model: CNN trained to detect where you are looking on the screen based on triple (face, right eye and left eye corped images)

Collect Data (data_collection_phase)

To collect data we prepared a code (data_collection.py).

The code open a window on the screen as big as Desktop screen. Then gives you 5 seconds to prepare.
Random dots are spawned on the screen
The user have cupple of second to look at the dot
The code take a screenshot of the face, right eye and left eye and saves them in the data/saved_images folder. The face and eyes are detected thanks to pretrained models I downloaded (haarcascade_frontalface_default.xml and haarcascade_eye.xml). (find more infos here)
Face, Left Eye, Right Eye, X coordinate of the dot and Y coordinate of the dot are saved in a CSV file (data/eye_data.csv)

Data Processing (clean_data)

Normalize image sizes: All images have pixel dimention difference. We need to cut them to be the all the same pixels. (fixed during data collection)
clean data: sometimes the eyes are missjuged by air, nose or something in the background. Delete those records. In clean_data/clean_data.py we show each triple (face right and lef eyes) and with keyboards keys we can approve or disapprove the triple. Then, all the triple with good and clean data are saved in clean_data/cleaned_eye_data.csv with of course the respective x and y coordinate we where looking at. For 'debuggin' there is another python script to scroll trough the approved images (scroll_approved_images.py).
data_exploration: such as checking perentage of the screen that was coverd by the x and y poits we looked at.
data_exploration: Normalize Coordinates: Scale x and y to a range between 0 and 1 by dividing by the screen width and height (in my lenovo: 1920x1080), respectively.

CNN Structure and Study

Input-Output Relationship

Input: A combination of three images: the face image, the left eye image, and the right eye image.

Output: A pair of continuous values (x, y) representing the gaze coordinates on the screen.

Input Representation

Concatenate Images: Process each image type (face, left eye, right eye) separately through different CNN branches and then concatenate their feature embeddings.

Model Architecture:

Separate CNN Branches: One CNN for the face image + Two separate CNNs for the left eye and right eye images.
Feature Fusion: Concatenate the embeddings from all CNN branches.
Fully Connected Layers: Pass the concatenated embeddings through fully connected layers to predict the normalized (x, y) coordinates.

Sample code

```python
import tensorflow as tf
from tensorflow.keras import layers, models

# Define CNN for processing images
def create_cnn(input_shape):
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(128, (3, 3), activation='relu'),
        layers.Flatten(),
        layers.Dense(128, activation='relu'),
    ])
    return model

# Input shapes
face_input = layers.Input(shape=(64, 64, 3))  # Example shape for face
left_eye_input = layers.Input(shape=(32, 32, 3))  # Example shape for left eye
right_eye_input = layers.Input(shape=(32, 32, 3))  # Example shape for right eye

# Create CNN branches
face_branch = create_cnn((64, 64, 3))(face_input)
left_eye_branch = create_cnn((32, 32, 3))(left_eye_input)
right_eye_branch = create_cnn((32, 32, 3))(right_eye_input)

# Concatenate features
concatenated = layers.Concatenate()([face_branch, left_eye_branch, right_eye_branch])

# Fully connected layers
fc = layers.Dense(256, activation='relu')(concatenated)
fc = layers.Dense(128, activation='relu')(fc)
output = layers.Dense(2, activation='linear')(fc)  # Predict (x, y)

# Build model
model = models.Model(inputs=[face_input, left_eye_input, right_eye_input], outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])

model.summary()
```

Resources

Research Papers:

"Gaze Estimation via Deep Learning" (e.g., papers from ECCV or CVPR).
Papers from datasets like MPIIGaze or GazeCapture.

Datasets:

MPIIGaze, GazeCapture for transfer learning or pretraining.

Tutorials:

TensorFlow/Keras or PyTorch tutorials on multimodal inputs.
Youtube Eye-tracking Mouse Using Convolutional Neural Networks and Webcam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eye Traking System

Repo Structure

Collect Data (data_collection_phase)

Data Processing (clean_data)

CNN Structure and Study

Input-Output Relationship

Input Representation

Model Architecture:

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
clean_data		clean_data
data_collection_phase		data_collection_phase
only_face_model		only_face_model
triple_eyeface_model		triple_eyeface_model
README.md		README.md

GRINGOLOCO7/Eye_Traking

Folders and files

Latest commit

History

Repository files navigation

Eye Traking System

Repo Structure

Collect Data (data_collection_phase)

Data Processing (clean_data)

CNN Structure and Study

Input-Output Relationship

Input Representation

Model Architecture:

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages