Official Code Implementation of Variational Augmentation of Enhancing Historical Document Image Binarization
Accepted at: ICVGIP 2022
Historical Document Image Binarization is a well-known segmentation problem in image processing. Despite ubiquity, traditional thresholding algorithms achieved limited success on severely degraded document images. With the advent of deep learning, several segmentation models were proposed that made significant progress in the field but were limited by the unavailability of large training datasets. To mitigate this problem, we have proposed a novel two-stage framework -- the first of which comprises a generator that generates degraded samples using variational inference and the second being a CNN-based binarization network that trains on the generated data. We evaluated our framework on a range of DIBCO datasets, where it achieved competitive results against previous state-of-the-art methods.
Deep learning-based methods need large training datasets which are not readily available in the domain of historical documents. To tackle this problem we propose a two-stage framewrork:
- Aug-Net: A VAE-GAN-based augmentation module based on BicycleGAN that generates synthetic training samples.
- Bin-Net: An U-Net based segmentation module for the binarization task, trained on the synthetic samples generated by Aug-Net.
- Python 3.7+
- Pytorch 1.9+
- Albumentations
- Fast AI
- You can download the training images of DIBCO from here. Extract patches using
datamaker.py
. - You can download the testing data from here.
- You can also download the training patches directly from here. (recommended)
- training_datasets
- - train
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches
- - val
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches
- testing_datasets
- - <DIBCO_YEAR>
- - - - bw_patches
- - - - gt_patches
- - - - cl_patches
- - - - results
- Restoration
- - code
- - - - all relavant files here (this repo)
- - weights
- - - - pretrained/saved weights here
-
The Augmentation Network (Aug-Net) is based on BicycleGAN. Train the model according to the instructions specified in their official repository using the patches extracted from the training data. Copy the
checkpoints
folder intosynthetic/
. -
Create a subdirectory
evaluation/
to store intermediate results while the model is training. -
Run
train.py
to train the Binarization Network (Bin-Net).
- Change path to the directory containing the test images.
- Specify path to weight files.
- Run
infer.py
. - For evaluation, specify the paths to the outputs and the ground truth images in
eval.py
and run it.
If you find our paper or code useful, consider citing us:
@misc{https://doi.org/10.48550/arxiv.2211.06581,
doi = {10.48550/ARXIV.2211.06581},
url = {https://arxiv.org/abs/2211.06581},
author = {Dey, Avirup and Das, Nibaran and Nasipuri, Mita},
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences, I.4.6},
title = {Variational Augmentation for Enhancing Historical Document Image Binarization}
Our work is partly based on BicycleGAN and we made extensive use of their code. We would like to thank the authors for their contribution.
- Inference instructions
- Add environment.yml
- Add weight files
- Add sample images