Submitted to blnk Egypt
TrOCR is a Transformer-based OCR, This repo implements the TrOCR From scratch using Tensorflow, and Django. This README.md contains:
The application is divided into two main apps:
- TrOCR_Django_App: Main application containing the main webpage interface and communicates with Deep_Learning_App.
- Deep_Learning_App: This application handles all deep learning implementation and scripts. For instance, building encoder, decorer, TrOCR Model,
predict.py,train.pyand etc.
Setting Up Environment: Create and Activate a Virtual Environment
python -m venv ocr_detector_venv
source ocr_detector_venv/Scripts/activateInstall required dependencies
pip install -r requirements.txtEdit network/ data configuration. If you prefer to use Vim:
vim Deep_Learning_App/src/config.pyor using Nano editor: Edit network/ data configuration. If you prefer to use Vim:
nano Deep_Learning_App/src/config.pyStarting Django Server: Starting development server at http://127.0.0.1:8000/
python manage.py runserverLet's first see the implementation details of such a project:
- Data Loader: This module: a. Loads the dataset images and corresponding text. b. Tokenize the words using Bert Arabic Tokenizer.
- Preprocessing: The preprocessing implemented included: a. Extracting and cropping the text from the image: This part is done using OpenCV Library by thresholding and finding contours of text to extract the text box as shown:
b. Image resizing after extracting the text: (88,200,3).
c. Normalization: /255.0.
- TrOCR Model: All encoder/ decoder architecture is written in Tensorflow in OOP Well-documented Inhertided Classes. Encoder/ Decoder configuration parameters are to be edited in
Deep_Learning_App/src/config.py
You can train/ evaluate or predict without the need to use Django Apps. To do this you can train the model using:
cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/train.py
Or to evaluate:
cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/evaluate.py
Or to predict:
cd TrOCR_Project
nano Deep_Learning_App/src/config.py
python Deep_Learning_App/src/predict.py --image_path absolute/path/to/image.jpg
To convert the TensorFlow model to onnx model:
cd TrOCR_Project
python Deep_Learning_App/src/to_onnx.py --model_path absolute/path/to/model.h5 --output_path absolute/path/to/output/directory
If you don't specify a model/ output path, the script will use the model path given in Deep_Learning_App/src/config.py you just run:
cd TrOCR_Project
python Deep_Learning_App/src/to_onnx.py
To convert the TensorFlow model to TRT model:
cd TrOCR_Project
python Deep_Learning_App/src/to_TRT.py --model_path absolute/path/to/model.h5 --output_path absolute/path/to/output/directory
If you don't specify a model/ output path, the script will use the model path given in Deep_Learning_App/src/config.py you just run:
cd TrOCR_Project
python Deep_Learning_App/src/to_TRT.py
- Loss/ Accuracy Masked Functions: Since Transformers require padding sequence length to have a unified length, predictions from paddings should not be accounted for loss/ accuracy calculations as they're being masked. Therefore, masked loss/ accuracy functions were created.
- Learning Rate Scheduler: A custom learning rate scheduler according to the formula in the original Transformer was implemented:
Main interface app will be like this:
This allows you to see how images are preprocessed (Before normalization).
First, we must upload the image:
Once you upload the image, we can predict:
Allows you to change parameters set in the config.py file:
Once you change the configuration:
Starting Tensorboard:
Logs Sample:
Kindly note for the sake of work-replication, small samples from images were added to the repo. For full replication, please add the dataset to: TrOCR_Project/media/dataset

