Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/build-and-notarize.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down Expand Up @@ -63,9 +65,13 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Download nircmd.exe
run: node scripts/download-nircmd.js
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down Expand Up @@ -106,6 +112,8 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current --platform darwin --arch ${{ matrix.arch }}
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down
72 changes: 72 additions & 0 deletions .github/workflows/build-windows-key-listener.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Build Windows Key Listener

on:
push:
paths:
- 'resources/windows-key-listener.c'
- '.github/workflows/build-windows-key-listener.yml'
branches:
- main
workflow_dispatch:
inputs:
version:
description: 'Version tag (e.g., 1.0.0)'
required: false
default: '1.0.0'
type: string

permissions:
contents: write

jobs:
build:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4

- name: Setup MSVC
uses: microsoft/setup-msbuild@v2

- name: Setup MSVC environment
uses: ilammy/msvc-dev-cmd@v1

- name: Compile Windows Key Listener
run: |
cl /O2 /nologo resources/windows-key-listener.c /Fe:windows-key-listener.exe user32.lib
shell: cmd

- name: Verify binary
run: |
if (Test-Path "windows-key-listener.exe") {
Write-Host "Binary built successfully"
Get-Item "windows-key-listener.exe" | Select-Object Name, Length
} else {
Write-Error "Binary not found"
exit 1
}
shell: pwsh

- name: Create zip archive
run: |
Compress-Archive -Path windows-key-listener.exe -DestinationPath windows-key-listener-win32-x64.zip
shell: pwsh

- name: Upload artifact
uses: actions/upload-artifact@v4
with:
name: windows-key-listener
path: windows-key-listener-win32-x64.zip

- name: Create or update release
uses: softprops/action-gh-release@v2
with:
tag_name: windows-key-listener-v${{ inputs.version || '1.0.0' }}
name: Windows Key Listener v${{ inputs.version || '1.0.0' }}
body: |
Prebuilt Windows key listener binary for Push-to-Talk functionality.

This binary uses Windows Low-Level Keyboard Hook to detect key press/release events.
files: windows-key-listener-win32-x64.zip
make_latest: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
8 changes: 8 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down Expand Up @@ -65,9 +67,13 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Download nircmd.exe
run: node scripts/download-nircmd.js
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down Expand Up @@ -101,6 +107,8 @@ jobs:

- name: Download whisper.cpp binaries
run: node scripts/download-whisper-cpp.js --current --platform darwin --arch ${{ matrix.arch }}
env:
GITHUB_TOKEN: ${{ secrets.GH_TOKEN }}

- name: Cache Electron
uses: actions/cache@v4
Expand Down
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [1.2.16] - 2026-01-26

### Added
- **Windows Push-to-Talk**: Native Windows key listener with low-level keyboard hook for true push-to-talk functionality
- Supports compound hotkeys like `Ctrl+Shift+F11` or `CommandOrControl+Space`
- Prebuilt binary automatically downloaded from GitHub releases
- Fallback to tap mode if binary unavailable
- **Custom Dictionary**: Improve transcription accuracy for specific words, names, and technical terms
- Add custom words through Settings → Custom Dictionary
- Words are passed as hints to Whisper for better recognition
- Works with both local and cloud transcription
- **GitHub Actions Workflow**: Automated CI workflow to build and release Windows key listener binary
- **Shared Download Utilities**: New `scripts/lib/download-utils.js` module with reusable download, extraction, and GitHub release fetching functions

### Changed
- **Download Scripts Refactored**: All download scripts now use shared utilities for consistency
- **GitHub API Authentication**: Download scripts support `GITHUB_TOKEN` to avoid API rate limits in CI
- **Dev Server Port Alignment**: Development server port configuration improved for consistency

### Fixed
- **Windows Production Build**: Fixed Windows production build issues with proper binary bundling
- **Code Quality**: Various code quality improvements in download scripts and dev server management

## [1.2.15] - 2026-01-22

### Added
Expand Down
92 changes: 89 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- **main.js**: Application entry point, initializes all managers
- **preload.js**: Exposes safe IPC methods to renderer via window.api

### Native Resources (resources/)

- **windows-key-listener.c**: C source for Windows low-level keyboard hook (Push-to-Talk)
- **globe-listener.swift**: Swift source for macOS Globe/Fn key detection
- **bin/**: Directory for compiled native binaries (whisper-cpp, nircmd, key listeners)

### Helper Modules (src/helpers/)

- **audioManager.js**: Handles audio device management
Expand All @@ -62,6 +68,11 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- Converts Electron hotkey format to GNOME keysym format
- Only active on Linux + Wayland + GNOME desktop
- **ipcHandlers.js**: Centralized IPC handler registration
- **windowsKeyManager.js**: Windows Push-to-Talk support with native key listener
- Spawns native `windows-key-listener.exe` binary for low-level keyboard hooks
- Supports compound hotkeys (e.g., `Ctrl+Shift+F11`, `CommandOrControl+Space`)
- Emits `key-down` and `key-up` events for push-to-talk functionality
- Graceful fallback if binary unavailable
- **menuManager.js**: Application menu management
- **tray.js**: System tray icon and menu
- **whisper.js**: Local whisper.cpp integration and model management
Expand Down Expand Up @@ -107,6 +118,22 @@ OpenWhispr is an Electron-based desktop dictation application that uses whisper.
- GGML model downloads from HuggingFace
- Models stored in `~/.cache/openwhispr/whisper-models/`

### Build Scripts (scripts/)

- **download-whisper-cpp.js**: Downloads whisper.cpp binaries from GitHub releases
- **download-llama-server.js**: Downloads llama.cpp server for local LLM inference
- **download-nircmd.js**: Downloads nircmd.exe for Windows clipboard operations
- **download-windows-key-listener.js**: Downloads prebuilt Windows key listener binary
- **build-globe-listener.js**: Compiles macOS Globe key listener from Swift source
- **build-windows-key-listener.js**: Compiles Windows key listener (for local development)
- **run-electron.js**: Development script to launch Electron with proper environment
- **lib/download-utils.js**: Shared utilities for downloading and extracting files
- `fetchLatestRelease(repo, options)`: Fetches latest release from GitHub API
- `downloadFile(url, dest)`: Downloads file with progress and retry logic
- `extractZip(zipPath, destDir)`: Cross-platform zip extraction
- `parseArgs()`: Parses CLI arguments for platform/arch targeting
- Supports `GITHUB_TOKEN` for authenticated requests (higher rate limits)

## Key Implementation Details

### 1. FFmpeg Integration
Expand Down Expand Up @@ -166,6 +193,7 @@ Settings stored in localStorage with these keys:
- `reasoningProvider`: AI provider (openai/anthropic/gemini/local)
- `hotkey`: Custom hotkey configuration
- `hasCompletedOnboarding`: Onboarding completion flag
- `customDictionary`: JSON array of words/phrases for improved transcription accuracy

### 6. Language Support

Expand Down Expand Up @@ -276,7 +304,52 @@ Enable with `--log-level=debug` or `OPENWHISPR_LOG_LEVEL=debug` (can be set in `
- Audio level analysis
- Complete reasoning pipeline debugging with stage-by-stage logging

### 12. GNOME Wayland Global Hotkeys
### 12. Windows Push-to-Talk

Native Windows support for true push-to-talk functionality using low-level keyboard hooks:

**Architecture**:
- `resources/windows-key-listener.c`: Native C program using Windows `SetWindowsHookEx` for keyboard hooks
- `src/helpers/windowsKeyManager.js`: Node.js wrapper that spawns and manages the native binary
- Binary outputs `KEY_DOWN` and `KEY_UP` to stdout when target key is pressed/released

**Compound Hotkey Support**:
- Parses hotkey strings like `CommandOrControl+Shift+F11`
- Maps modifiers: `CommandOrControl`/`Ctrl` → VK_CONTROL, `Alt`/`Option` → VK_MENU, `Shift` → VK_SHIFT
- Verifies all required modifiers are held before emitting key events

**Binary Distribution**:
- Prebuilt binary downloaded from GitHub releases (`windows-key-listener-v*` tags)
- Download script: `scripts/download-windows-key-listener.js`
- CI workflow: `.github/workflows/build-windows-key-listener.yml`
- Fallback to tap mode if binary unavailable

**IPC Events**:
- `windows-key-listener:key-down`: Fired when hotkey pressed (start recording)
- `windows-key-listener:key-up`: Fired when hotkey released (stop recording)

### 13. Custom Dictionary

Improve transcription accuracy for specific words, names, or technical terms:

**How it works**:
- User adds words/phrases through Settings → Custom Dictionary
- Words stored as JSON array in localStorage (`customDictionary` key)
- On transcription, words are joined and passed as `prompt` parameter to Whisper
- Works with both local whisper.cpp and cloud OpenAI Whisper API

**Implementation**:
- `src/hooks/useSettings.ts`: Manages `customDictionary` state
- `src/components/SettingsPage.tsx`: UI for adding/removing dictionary words
- `src/helpers/audioManager.js`: Reads dictionary and adds to transcription options
- `src/helpers/whisperServer.js`: Includes dictionary as `prompt` in API request

**Whisper Prompt Parameter**:
- Whisper uses the prompt as context/hints for transcription
- Words in the prompt are more likely to be recognized correctly
- Useful for: uncommon names, technical jargon, brand names, domain-specific terms

### 14. GNOME Wayland Global Hotkeys

On GNOME Wayland, Electron's `globalShortcut` API doesn't work due to Wayland's security model. OpenWhispr uses native GNOME shortcuts:

Expand Down Expand Up @@ -320,6 +393,8 @@ On GNOME Wayland, Electron's `globalShortcut` API doesn't work due to Wayland's
- [ ] Verify whisper.cpp binary detection
- [ ] Test all Whisper models
- [ ] Check agent naming functionality
- [ ] Test custom dictionary with uncommon words
- [ ] Verify Windows Push-to-Talk with compound hotkeys
- [ ] Test GNOME Wayland hotkeys (if on GNOME + Wayland)
- [ ] Verify activation mode selector is hidden on GNOME Wayland

Expand Down Expand Up @@ -348,10 +423,16 @@ On GNOME Wayland, Electron's `globalShortcut` API doesn't work due to Wayland's
- Use `npm run pack` for unsigned builds (CSC_IDENTITY_AUTO_DISCOVERY=false)
- Signing requires Apple Developer account
- ASAR unpacking needed for FFmpeg
- Run `npm run download:whisper-cpp` before packaging (current platform)
- Use `npm run download:whisper-cpp:all` for multi-platform packaging
- Run `npm run download:whisper-cpp` before packaging (current platform)
- Use `npm run download:whisper-cpp:all` for multi-platform packaging
- afterSign.js automatically skips signing when CSC_IDENTITY_AUTO_DISCOVERY=false

5. **Windows Push-to-Talk Binary**:
- Prebuilt binary downloaded automatically on Windows during build
- If download fails, push-to-talk falls back to tap mode
- To compile locally: install Visual Studio Build Tools or MinGW-w64
- CI workflow (`.github/workflows/build-windows-key-listener.yml`) auto-builds on push to main

### Platform-Specific Notes

**macOS**:
Expand All @@ -369,6 +450,11 @@ On GNOME Wayland, Electron's `globalShortcut` API doesn't work due to Wayland's
- Sound settings at `ms-settings:sound`
- NSIS installer for distribution
- whisper.cpp bundled for x64
- **Push-to-Talk**: Native key listener binary (`windows-key-listener.exe`) enables true push-to-talk
- Uses Windows Low-Level Keyboard Hook (`WH_KEYBOARD_LL`)
- Supports compound hotkeys (e.g., `Ctrl+Shift+F11`)
- Prebuilt binary auto-downloaded from GitHub releases
- Falls back to tap mode if unavailable

**Linux**:
- Multiple package manager support
Expand Down
22 changes: 20 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
- 🔄 **OpenAI Responses API**: Using the latest Responses API for improved performance
- 🌐 **Globe Key Toggle (macOS)**: Optional Fn/Globe key listener for a hardware-level dictation trigger
- ⌨️ **Compound Hotkeys**: Support for multi-key combinations like `Cmd+Shift+K`
- 🎙️ **Push-to-Talk (Windows)**: Native low-level keyboard hook for true push-to-talk with compound hotkey support
- 📖 **Custom Dictionary**: Add words, names, and technical terms to improve transcription accuracy
- 🐧 **GNOME Wayland Support**: Native global shortcuts via D-Bus for GNOME Wayland users

## Prerequisites
Expand Down Expand Up @@ -310,8 +312,21 @@ Once you've named your agent during setup, you can interact with it using multip

The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output.

### Custom Dictionary
Improve transcription accuracy for specific words, names, or technical terms:

1. **Access Settings**: Open Control Panel → Settings → Custom Dictionary
2. **Add Words**: Enter words, names, or phrases that are frequently misrecognized
3. **How It Works**: Words are provided as context hints to the speech recognition model

**Examples of words to add**:
- Uncommon names (e.g., "Sergey", "Xanthe")
- Technical jargon (e.g., "Kubernetes", "OAuth")
- Brand names (e.g., "OpenWhispr", "whisper.cpp")
- Domain-specific terms (e.g., "amortization", "polymerase")

### Processing Options
- **Local Processing**:
- **Local Processing**:
- Install Whisper automatically through the Control Panel
- Download models: tiny (fastest), base (recommended), small, medium, large (best quality)
- Complete privacy - audio never leaves your device
Expand Down Expand Up @@ -378,6 +393,7 @@ open-whispr/
- `npm run build:renderer` - Build the React app only
- `npm run download:whisper-cpp` - Download whisper.cpp for the current platform
- `npm run download:whisper-cpp:all` - Download whisper.cpp for all platforms
- `npm run compile:native` - Compile native helpers (Globe key listener for macOS, key listener for Windows)
- `npm run build` - Full build with signing (requires certificates)
- `npm run build:mac` - macOS build with signing
- `npm run build:win` - Windows build with signing
Expand Down Expand Up @@ -572,13 +588,15 @@ A: OpenWhispr supports 58 languages including English, Spanish, French, German,

## Project Status

OpenWhispr is actively maintained and ready for production use. Current version: 1.2.12
OpenWhispr is actively maintained and ready for production use. Current version: 1.2.16

- ✅ Core functionality complete
- ✅ Cross-platform support (macOS, Windows, Linux)
- ✅ Local and cloud processing
- ✅ Multi-provider AI (OpenAI, Anthropic, Gemini, Groq, Local)
- ✅ Compound hotkey support
- ✅ Windows Push-to-Talk with native key listener
- ✅ Custom dictionary for improved transcription accuracy

## Acknowledgments

Expand Down
2 changes: 1 addition & 1 deletion electron-builder.json
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@
{
"from": "resources/bin/",
"to": "bin/",
"filter": ["whisper-cpp-*", "whisper-server-*", "llama-server-*", "*.dylib", "*.dll", "*.so*"]
"filter": ["whisper-cpp-*", "whisper-server-*", "llama-server-*", "windows-key-listener*", "*.dylib", "*.dll", "*.so*"]
}
],
"mac": {
Expand Down
Loading
Loading