Skip to content

Feature: single daemon for hotkey dictation + OpenAI-compatible local STT API #244

@krystophny

Description

@krystophny

Problem

Running desktop hotkey dictation and API STT as separate daemons increases operational complexity and failure modes.

Proposal

Add a service mode to voxtype daemon that starts a local OpenAI-compatible transcription API in parallel with the existing daemon loop.

Goals

  • Single daemon process for:
    • existing hotkey-driven dictation
    • local API transcription
  • OpenAI-compatible endpoint:
    • POST /v1/audio/transcriptions
  • Health endpoint:
    • GET /healthz
  • Loopback-only by default (127.0.0.1)
  • No built-in auth in v1 (assume trusted localhost)

Language behavior

  • General language support remains configurable.
  • Default constrained auto set for local deployment/tests: de,en.

Non-goals (v1)

  • Public network exposure
  • Built-in auth/mTLS
  • TTS or chat endpoints

Why

This enables a clean architecture where upstream voxtype is the single local speech daemon, while another service (for example Tabura) handles external auth/proxy and forwards requests to localhost.

Acceptance criteria

  • voxtype daemon can run hotkey flow and API flow concurrently.
  • API transcriptions work while daemon is active.
  • Existing daemon behavior remains unchanged when service mode is disabled.
  • Integration tests cover multipart transcription endpoint and concurrent operation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions