Custom training for detection and recognition with A4 300dpi pages, possible? #1711

chmaz · 2024-08-29T14:09:01Z

chmaz
Aug 29, 2024

Hello!

I am new to docTR and am reading the documentation. Before looking more deeply in the repo, I have two questions to understand better if I could use it for my case:

(1) I work with A4 300 dpi pages, which corresponds to a resolution of 2480 x 3508 pixels (usually black and white scans). Would it be thinkable to train doctr detection and recognition models from scratch with this resolution? (for example the height of the training samples for recognition should internally probably be 64 pixels instead of 32 and so on).

(2) I have a second question, for a custom recognition training dataset, there is one image per word. If we aim to have a training set with several hundred thousands words, it makes a huge number of files inside one folder. Is it possible to have the images in a hierarchy of directories and the json containing relative paths instead of file names?

Sorry if I am asking wrong questions,

Thanks in advance,

Chris

felixdittrich92 · 2024-08-29T14:23:26Z

felixdittrich92
Aug 29, 2024
Maintainer

Hi @chmaz 👋,

yes for training and inference we resize to 1024x1024 (for detection) by keeping aspect ratio and symmetric padding and each word crop to 32x128 (wider word detections are splitted and merged together under the hood while inference)
This would be possible but would require small adjustments in the training script :) ref.: https://github.com/felixdittrich92/doctr/blob/9045dcfc9c5c837b06fcda8e802f7cf1d95bd18c/doctr/datasets/recognition.py#L49
(serveral subfolders which contains a images folder and a corresponding labels.json file) (Note: image file names and corresponding annotations e.g.: {"3.jpg": "abc"} needs to have unique keys between - otherwise merging would overwrite

Best regards,
Felix

0 replies

chmaz · 2024-08-30T06:35:51Z

chmaz
Aug 30, 2024
Author

Thanks a lot for the quick answers,
Best regards,
Chris

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom training for detection and recognition with A4 300dpi pages, possible? #1711

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Custom training for detection and recognition with A4 300dpi pages, possible? #1711

chmaz Aug 29, 2024

Replies: 2 comments

felixdittrich92 Aug 29, 2024 Maintainer

chmaz Aug 30, 2024 Author

chmaz
Aug 29, 2024

felixdittrich92
Aug 29, 2024
Maintainer

chmaz
Aug 30, 2024
Author