Skip to content

Ogtoad/herzael

Repository files navigation

Herzael

PDF reader that speaks text aloud while displaying it with a typewriter effect. OCR and TTS run concurrently so playback starts before the full document is processed.

Install

Linux

git clone https://github.com/Ogtoad/herzael.git
cd herzael
chmod +x install.sh
./install.sh

Installs system packages (tesseract-ocr, poppler-utils, libportaudio2), creates a virtualenv, installs Python dependencies, and downloads TTS models (~267 MB).

Windows

git clone https://github.com/Ogtoad/herzael.git
cd herzael
install.bat

Installs Tesseract OCR and Poppler automatically (via winget, then downloads Poppler from GitHub if needed), creates a virtualenv, installs Python dependencies, and downloads TTS models.

Usage

# Linux
source venv/bin/activate
python herzael.py document.pdf

# Windows
venv\Scripts\activate
python herzael.py document.pdf

Controls

Key Action
Space Pause / resume
Left Replay previous sentence
Right Skip to next sentence
Q Quit

Options

python herzael.py file.pdf [options]

--voice      M1-M5, F1-F5   Voice style (default: M2)
--speed      0.9-1.5         Speech speed multiplier (default: 1.0)
--skip-pages N               Skip the first N pages (default: 2)
--tts-steps  5 or 10         Generation quality: 5=fast, 10=higher quality (default: 5)
--max-pages-ahead N          How far OCR can run ahead of TTS in pages (default: 3)
--debug                      Enable verbose logging

Configuration

On first run, herzael_config.json is created in the working directory. Key settings:

{
  "tts": {
    "speed": 1.0,
    "total_steps": 5
  },
  "ocr": {
    "dpi": 200,
    "min_sentence_length": 10
  },
  "audio": {
    "default_voice": "M2"
  }
}

How it works

PDF --> Tesseract OCR --> JSON buffer --> Supertonic TTS --> sounddevice playback

Three threads run concurrently: OCR extracts sentences page by page into a shared buffer, TTS synthesises each sentence as it arrives, and the player outputs audio while rendering the typewriter display. OCR pauses automatically if it gets more than --max-pages-ahead pages ahead of playback.

Progress is saved to <filename>_buffer.json. On next run you will be asked whether to resume or start fresh.

Requirements

  • Python 3.8+
  • Tesseract OCR
  • Poppler (for pdf2image)

Both are installed automatically by the install scripts. If running without the scripts, add their executables to PATH or install to the default locations (C:\Program Files\Tesseract-OCR and C:\poppler on Windows).

Troubleshooting

No text extracted: Try --skip-pages 0 if the document starts on page 1. Use --debug to see per-page OCR output.

Models missing: Re-run python download_models.py from inside the virtualenv.

Wrong audio device: sounddevice uses the system default output. Change it in your OS audio settings.


MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors