Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [1.2.16] - 2026-01-26

### Added
- **NVIDIA Parakeet Support**: Fast local transcription via sherpa-onnx runtime with INT8 quantized models
- `parakeet-tdt-0.6b-v3`: Multilingual (25 languages), ~680MB
- **Windows Push-to-Talk**: Native Windows key listener with low-level keyboard hook for true push-to-talk functionality
- Supports compound hotkeys like `Ctrl+Shift+F11` or `CommandOrControl+Space`
- Prebuilt binary automatically downloaded from GitHub releases
Expand All @@ -25,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Dev Server Port Alignment**: Development server port configuration improved for consistency

### Fixed
- **Whisper Model Path Resolution**: Fixed `large` and `turbo` model lookup failures by using registry-defined filenames (`ggml-large-v3.bin`, `ggml-large-v3-turbo.bin`) instead of hardcoded pattern
- **Windows Production Build**: Fixed Windows production build issues with proper binary bundling
- **Code Quality**: Various code quality improvements in download scripts and dev server management

Expand Down
24 changes: 23 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- **Desktop Framework**: Electron 36 with context isolation
- **Database**: better-sqlite3 for local transcription history
- **UI Components**: shadcn/ui with Radix primitives
- **Speech Processing**: whisper.cpp (bundled native binary) + OpenAI API
- **Speech Processing**: whisper.cpp + NVIDIA Parakeet (via sherpa-onnx) + OpenAI API
- **Audio Processing**: FFmpeg (bundled via ffmpeg-static)

### Key Architectural Decisions
Expand Down Expand Up @@ -76,6 +76,8 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- **menuManager.js**: Application menu management
- **tray.js**: System tray icon and menu
- **whisper.js**: Local whisper.cpp integration and model management
- **parakeet.js**: NVIDIA Parakeet model management via sherpa-onnx
- **parakeetServer.js**: sherpa-onnx CLI wrapper for transcription
- **windowConfig.js**: Centralized window configuration
- **windowManager.js**: Window creation and lifecycle management

Expand Down Expand Up @@ -118,12 +120,28 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- GGML model downloads from HuggingFace
- Models stored in `~/.cache/openwhispr/whisper-models/`

### NVIDIA Parakeet Integration (via sherpa-onnx)

- **parakeet.js**: Model management for NVIDIA Parakeet ASR models
- Uses sherpa-onnx runtime for cross-platform ONNX inference
- Bundled binaries in `resources/bin/sherpa-onnx-{platform}-{arch}`
- INT8 quantized models for efficient CPU inference
- Models stored in `~/.cache/openwhispr/parakeet-models/`
- Server pre-warming on startup when `LOCAL_TRANSCRIPTION_PROVIDER=nvidia` is set
- Provider preference persisted to `.env` via `saveAllKeysToEnvFile()` on server start/stop

- **Available Models**:
- `parakeet-tdt-0.6b-v3`: Multilingual (25 languages), ~680MB

- **Download URLs**: Models from sherpa-onnx ASR models release on GitHub

### Build Scripts (scripts/)

- **download-whisper-cpp.js**: Downloads whisper.cpp binaries from GitHub releases
- **download-llama-server.js**: Downloads llama.cpp server for local LLM inference
- **download-nircmd.js**: Downloads nircmd.exe for Windows clipboard operations
- **download-windows-key-listener.js**: Downloads prebuilt Windows key listener binary
- **download-sherpa-onnx.js**: Downloads sherpa-onnx binaries for Parakeet support
- **build-globe-listener.js**: Compiles macOS Globe key listener from Swift source
- **build-windows-key-listener.js**: Compiles Windows key listener (for local development)
- **run-electron.js**: Development script to launch Electron with proper environment
Expand Down Expand Up @@ -195,6 +213,10 @@ Settings stored in localStorage with these keys:
- `hasCompletedOnboarding`: Onboarding completion flag
- `customDictionary`: JSON array of words/phrases for improved transcription accuracy

Environment variables persisted to `.env` (via `saveAllKeysToEnvFile()`):
- `LOCAL_TRANSCRIPTION_PROVIDER`: Transcription engine (`nvidia` for Parakeet)
- `PARAKEET_MODEL`: Selected Parakeet model name (e.g., `parakeet-tdt-0.6b-v3`)

### 6. Language Support

58 languages supported (see src/utils/languages.ts):
Expand Down
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
- 📱 **Control Panel**: Manage settings, view history, and configure API keys
- 🗄️ **Transcription History**: SQLite database stores all your transcriptions locally
- 🔧 **Model Management**: Download and manage local Whisper models (tiny, base, small, medium, large, turbo)
- ⚡ **NVIDIA Parakeet**: Fast local transcription via sherpa-onnx (multilingual, 25 languages)
- 🧹 **Model Cleanup**: One-click removal of cached Whisper models with uninstall hooks to keep disks tidy
- 🌐 **Cross-Platform**: Works on macOS, Windows, and Linux
- ⚡ **Automatic Pasting**: Transcribed text automatically pastes at your cursor location
Expand Down Expand Up @@ -380,7 +381,7 @@ open-whispr/
- **Desktop**: Electron 36 with context isolation
- **UI Components**: shadcn/ui with Radix primitives
- **Database**: better-sqlite3 for local transcription storage
- **Speech-to-Text**: OpenAI Whisper (powered by whisper.cpp for local, OpenAI API for cloud)
- **Speech-to-Text**: OpenAI Whisper (whisper.cpp) + NVIDIA Parakeet (sherpa-onnx) for local, OpenAI API for cloud
- **Icons**: Lucide React for consistent iconography

## Development
Expand Down Expand Up @@ -490,6 +491,21 @@ For local processing, OpenWhispr uses OpenAI's Whisper model via whisper.cpp - a

**Upgrading from Python-based version**: If you previously used the Python-based Whisper, you'll need to re-download models in GGML format. You can safely delete the old Python environment (`~/.openwhispr/python/`) and PyTorch models (`~/.cache/whisper/`) to reclaim disk space.

### Local Parakeet Setup (Alternative)

OpenWhispr also supports NVIDIA Parakeet models via sherpa-onnx - a fast alternative to Whisper:

1. **Bundled Binary**: sherpa-onnx is bundled with the app for all platforms
2. **INT8 Quantized Models**: Efficient CPU inference
3. **Models stored in**: `~/.cache/openwhispr/parakeet-models/`

**Available Models**:
- `parakeet-tdt-0.6b-v3`: Multilingual (25 languages), ~680MB

**When to use Parakeet vs Whisper**:
- **Parakeet**: Best for speed-critical use cases or lower-end hardware
- **Whisper**: Best for quality-critical use cases or when you need specific model sizes

### Customization

- **Hotkey**: Change in the Control Panel (default: backtick `) - fully customizable
Expand Down Expand Up @@ -602,6 +618,8 @@ OpenWhispr is actively maintained and ready for production use. Current version:

- **[OpenAI Whisper](https://github.com/openai/whisper)** - The speech recognition model that powers both local and cloud transcription
- **[whisper.cpp](https://github.com/ggerganov/whisper.cpp)** - High-performance C++ implementation of Whisper for local processing
- **[NVIDIA Parakeet](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3)** - Fast ASR model for efficient local transcription
- **[sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)** - Cross-platform ONNX runtime for Parakeet model inference
- **[Electron](https://www.electronjs.org/)** - Cross-platform desktop application framework
- **[React](https://react.dev/)** - UI component library
- **[shadcn/ui](https://ui.shadcn.com/)** - Beautiful UI components built on Radix primitives
Expand Down
2 changes: 1 addition & 1 deletion electron-builder.json
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@
{
"from": "resources/bin/",
"to": "bin/",
"filter": ["whisper-cpp-*", "whisper-server-*", "llama-server-*", "windows-key-listener*", "*.dylib", "*.dll", "*.so*"]
"filter": ["whisper-cpp-*", "whisper-server-*", "llama-server-*", "sherpa-onnx-*", "windows-key-listener*", "*.dylib", "*.dll", "*.so*"]
}
],
"mac": {
Expand Down
21 changes: 21 additions & 0 deletions main.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ const WindowManager = require("./src/helpers/windowManager");
const DatabaseManager = require("./src/helpers/database");
const ClipboardManager = require("./src/helpers/clipboard");
const WhisperManager = require("./src/helpers/whisper");
const ParakeetManager = require("./src/helpers/parakeet");
const TrayManager = require("./src/helpers/tray");
const IPCHandlers = require("./src/helpers/ipcHandlers");
const UpdateManager = require("./src/updater");
Expand All @@ -58,6 +59,7 @@ let hotkeyManager = null;
let databaseManager = null;
let clipboardManager = null;
let whisperManager = null;
let parakeetManager = null;
let trayManager = null;
let updateManager = null;
let globeKeyManager = null;
Expand Down Expand Up @@ -103,6 +105,7 @@ function initializeManagers() {
databaseManager = new DatabaseManager();
clipboardManager = new ClipboardManager();
whisperManager = new WhisperManager();
parakeetManager = new ParakeetManager();
trayManager = new TrayManager();
updateManager = new UpdateManager();
globeKeyManager = new GlobeKeyManager();
Expand Down Expand Up @@ -144,6 +147,7 @@ function initializeManagers() {
databaseManager,
clipboardManager,
whisperManager,
parakeetManager,
windowManager,
updateManager,
windowsKeyManager,
Expand Down Expand Up @@ -180,6 +184,19 @@ async function startApp() {
debugLogger.debug("Whisper startup init error (non-fatal)", { error: err.message });
});

// Initialize Parakeet manager at startup (don't await to avoid blocking)
// Settings can be provided via environment variables for server pre-warming:
// - LOCAL_TRANSCRIPTION_PROVIDER=nvidia to enable parakeet
// - PARAKEET_MODEL=parakeet-tdt-0.6b-v3 (model name)
const parakeetSettings = {
localTranscriptionProvider: process.env.LOCAL_TRANSCRIPTION_PROVIDER || "",
parakeetModel: process.env.PARAKEET_MODEL || "parakeet-tdt-0.6b-v3",
};
parakeetManager.initializeAtStartup(parakeetSettings).catch((err) => {
// Parakeet not being available at startup is not critical
debugLogger.debug("Parakeet startup init error (non-fatal)", { error: err.message });
});

// Pre-warm llama-server if local reasoning is configured
// Settings can be provided via environment variables:
// - REASONING_PROVIDER=local to enable local reasoning
Expand Down Expand Up @@ -526,6 +543,10 @@ if (gotSingleInstanceLock) {
if (whisperManager) {
whisperManager.stopServer().catch(() => {});
}
// Stop parakeet WS server if running
if (parakeetManager) {
parakeetManager.stopServer().catch(() => {});
}
// Stop llama-server if running
const modelManager = require("./src/helpers/modelManagerBridge").default;
modelManager.stopServer().catch(() => {});
Expand Down
Loading
Loading