Skip to content

Detect if no text was extracted / if there are grave inconsistencies #99

@mikegerber

Description

@mikegerber

I had a user report he wouldn't get good results. It turned out he used --text-equiv-level line when there was no line text. Ways to improve dinglehopper's behavior here:

  • Warn if no text is extracted (and maybe do so in a smart way. "no text" can be valid on empty pages.)
  • Warn if there are grave inconsistencies between levels (harder; and line vs region text can differ in a small ways)
  • Warn if there are grave differences between GT and OCR (e.g. no GT text but lots of OCR text; need to think about this more)
  • Check if I could use OCR-D libs here (I'm somewhat skeptical to change something here because the text extraction code here is working, and OCR-D changes a lot comparatively)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions