Tesseract inserting additional alternative characters

### Environment

* **Tesseract Version**: <3.x stable and 4.0 alpha/beta> for English language text (using Fast and Best trained data) **Command line**

* **Platform**: <Windows, version 64-bit and linux (Ubuntu/centos)-->

### Current Behavior: 
All versions of tesseract mentioned above tend to insert additional alternative characters (probably) whenever its not very confident. For example - if theres a "#" in the image file it often spits out "#H" or "A#" or even "AH"... Thats 2 characters for 1. Another example: If theres a "$" in the image then it gives "S$" or "$s" etc.. happens very often for other characters like 0,O,!,%,^ etc etc... 
My application is very sensitive to length of the string hence an extra character throws many things off.
I am currently a command-line user and may later use it in Java whenever a wrapper for 4.0 becomes available.

### Expected Behavior: 
Expect tesseract to give out only one character for each character in the image.  I should be able to control this behaviour using command line parameters (assuming there isn't one yet..). I have looked into the parameters but there are hundreds and mostly non-self-explanatory. Hence raising this as an issue. Also is it possible to get a "Character-level" HOCR output - current one is at word level granularity.

### Suggested Fix: 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tesseract inserting additional alternative characters #1465

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tesseract inserting additional alternative characters #1465

Description

Environment

Current Behavior:

Expected Behavior:

Suggested Fix:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions