Herzael

PDF reader that speaks text aloud while displaying it with a typewriter effect. OCR and TTS run concurrently so playback starts before the full document is processed.

Install

Linux

git clone https://github.com/Ogtoad/herzael.git
cd herzael
chmod +x install.sh
./install.sh

Installs system packages (tesseract-ocr, poppler-utils, libportaudio2), creates a virtualenv, installs Python dependencies, and downloads TTS models (~267 MB).

Windows

git clone https://github.com/Ogtoad/herzael.git
cd herzael
install.bat

Installs Tesseract OCR and Poppler automatically (via winget, then downloads Poppler from GitHub if needed), creates a virtualenv, installs Python dependencies, and downloads TTS models.

Usage

# Linux
source venv/bin/activate
python herzael.py document.pdf

# Windows
venv\Scripts\activate
python herzael.py document.pdf

Controls

Key	Action
Space	Pause / resume
Left	Replay previous sentence
Right	Skip to next sentence
Q	Quit

Options

python herzael.py file.pdf [options]

--voice      M1-M5, F1-F5   Voice style (default: M2)
--speed      0.9-1.5         Speech speed multiplier (default: 1.0)
--skip-pages N               Skip the first N pages (default: 2)
--tts-steps  5 or 10         Generation quality: 5=fast, 10=higher quality (default: 5)
--max-pages-ahead N          How far OCR can run ahead of TTS in pages (default: 3)
--debug                      Enable verbose logging

Configuration

On first run, herzael_config.json is created in the working directory. Key settings:

{
  "tts": {
    "speed": 1.0,
    "total_steps": 5
  },
  "ocr": {
    "dpi": 200,
    "min_sentence_length": 10
  },
  "audio": {
    "default_voice": "M2"
  }
}

How it works

PDF --> Tesseract OCR --> JSON buffer --> Supertonic TTS --> sounddevice playback

Three threads run concurrently: OCR extracts sentences page by page into a shared buffer, TTS synthesises each sentence as it arrives, and the player outputs audio while rendering the typewriter display. OCR pauses automatically if it gets more than --max-pages-ahead pages ahead of playback.

Progress is saved to <filename>_buffer.json. On next run you will be asked whether to resume or start fresh.

Requirements

Python 3.8+
Tesseract OCR
Poppler (for pdf2image)

Both are installed automatically by the install scripts. If running without the scripts, add their executables to PATH or install to the default locations (C:\Program Files\Tesseract-OCR and C:\poppler on Windows).

Troubleshooting

No text extracted: Try --skip-pages 0 if the document starts on page 1. Use --debug to see per-page OCR output.

Models missing: Re-run python download_models.py from inside the virtualenv.

Wrong audio device: sounddevice uses the system default output. Change it in your OS audio settings.

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
BUILD.md		BUILD.md
CONFIGURATION.md		CONFIGURATION.md
README.md		README.md
buffer.py		buffer.py
config.py		config.py
config_defaults.py		config_defaults.py
download_models.py		download_models.py
herzael.py		herzael.py
herzael.spec		herzael.spec
install.bat		install.bat
install.sh		install.sh
ocr.py		ocr.py
player.py		player.py
requirements.txt		requirements.txt
setup.py		setup.py
tts.py		tts.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Herzael

Install

Usage

Controls

Options

Configuration

How it works

Requirements

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Herzael

Install

Usage

Controls

Options

Configuration

How it works

Requirements

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages