PDF reader that speaks text aloud while displaying it with a typewriter effect. OCR and TTS run concurrently so playback starts before the full document is processed.
Linux
git clone https://github.com/Ogtoad/herzael.git
cd herzael
chmod +x install.sh
./install.shInstalls system packages (tesseract-ocr, poppler-utils, libportaudio2), creates a virtualenv, installs Python dependencies, and downloads TTS models (~267 MB).
Windows
git clone https://github.com/Ogtoad/herzael.git
cd herzael
install.batInstalls Tesseract OCR and Poppler automatically (via winget, then downloads Poppler from GitHub if needed), creates a virtualenv, installs Python dependencies, and downloads TTS models.
# Linux
source venv/bin/activate
python herzael.py document.pdf
# Windows
venv\Scripts\activate
python herzael.py document.pdf| Key | Action |
|---|---|
| Space | Pause / resume |
| Left | Replay previous sentence |
| Right | Skip to next sentence |
| Q | Quit |
python herzael.py file.pdf [options]
--voice M1-M5, F1-F5 Voice style (default: M2)
--speed 0.9-1.5 Speech speed multiplier (default: 1.0)
--skip-pages N Skip the first N pages (default: 2)
--tts-steps 5 or 10 Generation quality: 5=fast, 10=higher quality (default: 5)
--max-pages-ahead N How far OCR can run ahead of TTS in pages (default: 3)
--debug Enable verbose logging
On first run, herzael_config.json is created in the working directory. Key settings:
{
"tts": {
"speed": 1.0,
"total_steps": 5
},
"ocr": {
"dpi": 200,
"min_sentence_length": 10
},
"audio": {
"default_voice": "M2"
}
}PDF --> Tesseract OCR --> JSON buffer --> Supertonic TTS --> sounddevice playback
Three threads run concurrently: OCR extracts sentences page by page into a shared buffer, TTS synthesises each sentence as it arrives, and the player outputs audio while rendering the typewriter display. OCR pauses automatically if it gets more than --max-pages-ahead pages ahead of playback.
Progress is saved to <filename>_buffer.json. On next run you will be asked whether to resume or start fresh.
- Python 3.8+
- Tesseract OCR
- Poppler (for pdf2image)
Both are installed automatically by the install scripts. If running without the scripts, add their executables to PATH or install to the default locations (C:\Program Files\Tesseract-OCR and C:\poppler on Windows).
No text extracted: Try --skip-pages 0 if the document starts on page 1. Use --debug to see per-page OCR output.
Models missing: Re-run python download_models.py from inside the virtualenv.
Wrong audio device: sounddevice uses the system default output. Change it in your OS audio settings.
MIT License