-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
Please refer to the following link:
This concerns changes made to lstm_choices_mode.
Unless I misunderstand what these options are supposed to do, it appears like there is a bug or oversight. Please refer to this user area thread:
https://groups.google.com/forum/#!topic/tesseract-ocr/5tC6appoUgE
There seems to be no way to prevent lstm from including duplicates in the generated text and/or HOCR output. The example in the thread above is a clear example of this.
Surely there must be some way to force Tesseract to include only the highest confidence level choice of character when there are multiple possibilities.
Also, apologies if this is posted in the wrong place, and apologies for possible duplicate postings. I am a Tesseract newbie so trying to learn the ropes.
Thanks.