Scientific illustration extraction

This code is an adaptation of the YOLOv5 object detector for the specific task of detecting historical scientific illustrations. It is a Pytorch implementation of "Computer Vision and Historical Scientific Illustrations" paper (accepted at IAMAHA 2023 as an oral).

Check out our paper and webpage for more details!

If you find this code useful, please consider starring the repository ⭐ and citing the paper:

@inproceedings{aouinti2023computer,
  title={{Computer Vision and Historical Scientific Illustrations}},
  author={Aouinti, Fouad and Baltaci, Zeynep Sonat and Aubry, Mathieu and Guilbaud, Alexandre and Lazaris, Stavros},
  booktitle={IAMAHA},
  year={2023}
}

Installation 🛠️

Prerequisites

Sudo privileges

Bash terminal

Python >= 3.8

Git:

sudo apt install git

Having configured SSH access to GitHub

Repository

git clone https://github.com/faouinti/illustrationExtraction
cd illustrationExtraction

Python dependencies

python -m venv .env
source .env/bin/activate
pip install -r requirements.txt

Download datasets and models

To acquire the essential datasets, execute the following command:

./scripts/download_datasets.sh

This command initiates the download of the following datasets:

SynDoc dataset: 10k generated images with line-level page segmentation ground truth (19138 annotations)
VHS dataset: 4451 images verified by historians (8620 annotations)

To download the trained models, run the following command:

./scripts/download_models.sh

This will retrieve our trained models:

pre-trained on COCO and fine-tuned on SynDoc (models/coco_syndoc.pt)
trained from scratch (models/scratch_vhs.pt)
pre-trained on COCO and fine-tuned on VHS (models/coco_vhs.pt)
pre-trained on SynDoc and fine-tuned on VHS (models/syndoc_vhs.pt)

How to use

In the demo folder, we provide a Jupyter notebook designed to identify scientific illustrations within a specified image and store the corresponding results.

Training

Pre-train on COCO and fine-tun on SynDoc:

python train.py --epochs 300 --data syndoc.yaml --weights yolov5s.pt

Train from scratch and fine-tune on VHS:

python train.py --epochs 300 --data vhs.yaml --weights '' --cfg yolov5s.yaml

Pre-train on COCO and fine-tun on VHS:

python train.py --epochs 300 --data vhs.yaml --weights yolov5s.pt

Pre-train on SynDoc and fine-tune on VHS:

python train.py --epochs 300 --data vhs.yaml --weights models/coco_syndoc.pt

--weights: model path or triton URL
--data: dataset.yaml path

Validation

Validate a trained VHS detection model on a test dataset:

python val.py --weights path/to/model.pt --data path/to/data.yaml --task test --name path/to/output

--weights: model path or triton URL (e.g. models/syndoc_vhs.pt)
--task: train, val, test, speed or study
--name: save to project/name

NB: models/syndoc_vhs.pt is the network pre-trained on SynDoc and fine-tuned on VHS.

Inference

Run YOLOv5 detection inference on various sources (e.g., the VHS test images) and save the results in the runs/detect directory:

python detect.py --weights path/to/model.pt --source path/to/src/ --conf-thres 0.1 --save-crop

--source: file/dir/URL/glob/screen/0(webcam) (e.g. vhs/images/test/)
--conf-thres: filter predictions below the specified confidence level during inference
--save-crop: save cropped prediction boxes

Acknowledgements

This work was supported by the ANR (ANR project VHS ANR-21-CE38-0008). MA and SB were supported by ERC project DISCOVER funded by the European Union's Horizon Europe Research and Innovation programme under grant agreement No. 101076028. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
classify		classify
data		data
datasets		datasets
demo		demo
models		models
scripts		scripts
segment		segment
utils		utils
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
benchmarks.py		benchmarks.py
detect.py		detect.py
export.py		export.py
hubconf.py		hubconf.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
train.py		train.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scientific illustration extraction

Installation 🛠️

Prerequisites

Repository

Python dependencies

Download datasets and models

How to use

Training

Validation

Inference

Acknowledgements

About

Releases

Packages

Languages

faouinti/illustrationExtraction

Folders and files

Latest commit

History

Repository files navigation

Scientific illustration extraction

Installation 🛠️

Prerequisites

Repository

Python dependencies

Download datasets and models

How to use

Training

Validation

Inference

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages