Skip to content

abdallah1097/TrOCR_Project

Repository files navigation

TrOCR Detector

Submitted to blnk Egypt

N|Solid

TrOCR is a Transformer-based OCR, This repo implements the TrOCR From scratch using Tensorflow, and Django. This README.md contains:

TrOCR

The application is divided into two main apps:

  1. TrOCR_Django_App: Main application containing the main webpage interface and communicates with Deep_Learning_App.
  2. Deep_Learning_App: This application handles all deep learning implementation and scripts. For instance, building encoder, decorer, TrOCR Model, predict.py, train.py and etc.

Getting Started ...

Setting Up Environment: Create and Activate a Virtual Environment

python -m venv ocr_detector_venv
source ocr_detector_venv/Scripts/activate

Install required dependencies

pip install -r requirements.txt

Edit network/ data configuration. If you prefer to use Vim:

vim Deep_Learning_App/src/config.py

or using Nano editor: Edit network/ data configuration. If you prefer to use Vim:

nano Deep_Learning_App/src/config.py

Starting Django Server: Starting development server at http://127.0.0.1:8000/

python manage.py runserver

Deep Learning App

Let's first see the implementation details of such a project:

  1. Data Loader: This module: a. Loads the dataset images and corresponding text. b. Tokenize the words using Bert Arabic Tokenizer.
  2. Preprocessing: The preprocessing implemented included: a. Extracting and cropping the text from the image: This part is done using OpenCV Library by thresholding and finding contours of text to extract the text box as shown:

268078097-015dd1e3-aa45-46eb-b7c4-48512cd532aa

b. Image resizing after extracting the text: (88,200,3).
c. Normalization: /255.0.
  1. TrOCR Model: All encoder/ decoder architecture is written in Tensorflow in OOP Well-documented Inhertided Classes. Encoder/ Decoder configuration parameters are to be edited in Deep_Learning_App/src/config.py

Run Deep Learning Scripts

You can train/ evaluate or predict without the need to use Django Apps. To do this you can train the model using:

cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/train.py

Or to evaluate:

cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/evaluate.py

Or to predict:

cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/predict.py --image_path absolute/path/to/image.jpg

To convert the TensorFlow model to onnx model:

cd TrOCR_Project
python Deep_Learning_App/src/to_onnx.py --model_path absolute/path/to/model.h5 --output_path absolute/path/to/output/directory

If you don't specify a model/ output path, the script will use the model path given in Deep_Learning_App/src/config.py you just run:

cd TrOCR_Project
python Deep_Learning_App/src/to_onnx.py

To convert the TensorFlow model to TRT model:

cd TrOCR_Project
python Deep_Learning_App/src/to_TRT.py --model_path absolute/path/to/model.h5 --output_path absolute/path/to/output/directory

If you don't specify a model/ output path, the script will use the model path given in Deep_Learning_App/src/config.py you just run:

cd TrOCR_Project
python Deep_Learning_App/src/to_TRT.py
  1. Loss/ Accuracy Masked Functions: Since Transformers require padding sequence length to have a unified length, predictions from paddings should not be accounted for loss/ accuracy calculations as they're being masked. Therefore, masked loss/ accuracy functions were created.
  2. Learning Rate Scheduler: A custom learning rate scheduler according to the formula in the original Transformer was implemented:

output_Xij3MwYVRAAS_1 Capture

Django App

Main interface app will be like this:

Capture

Show PreProcessed App

This allows you to see how images are preprocessed (Before normalization).

preprocessed_Images

Show Predict

First, we must upload the image:

upload_image

Once you upload the image, we can predict:

upload_successfully

Configure Model

Allows you to change parameters set in the config.py file:

configuration_2 configuration_1

Once you change the configuration:

config_succ

See Training Logs

Starting Tensorboard:

tensorboard

Logs Sample:

tensorboard_logs

Note

Kindly note for the sake of work-replication, small samples from images were added to the repo. For full replication, please add the dataset to: TrOCR_Project/media/dataset

About

This repo implements the TrOCR from scratch using Tensorflow

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published