Skip to content

Xinkai/dictype

Repository files navigation

Dictype

Real-time voice-to-text input on Linux.

Features

Setup

  1. Install packages for your distro.

    Arch Linux

    Install packages from AUR:

  2. Configure Dictype.

    # This is the configuration file for Dictype.
    # Put it at `~/.config/dictype.toml`.
    
    [PulseAudio]
    # Use the following command to get a list of available `source_name`.
    # $ pactl --format json list sources \
    #   | jq '[
    #     .[]
    #     | select((.monitor_of_sink == null) and (.name | endswith(".monitor") | not))
    #     | {
    #         source_name: .["name"],
    #         properties: {
    #            device_name: .properties["device.name"],
    #            device_alias: .properties["device.alias"],
    #            device_description: .properties["device.description"]
    #         }
    #     }
    #   ]'
    preferred_source_name = "..." # optional
    
    # You can have up to 5 profiles at the same time, starting with Profile1.
    # Each profile may have different formats depending on the model (Backend).
    [Profiles.Profile1]
    Backend = "ParaformerV2"
    Config = {
        dashscope_api_key = "...",                   # required
        dashscope_websocket_url = "wss://dashscope.aliyuncs.com/api-ws/v1/inference", # optional
        disfluency_removal_enabled = true,           # optional
        language_hints = ["zh"],                     # optional
        semantic_punctuation_enabled = false,        # optional
        max_sentence_silence = 800,                  # optional
        multi_threshold_mode_enabled = false,        # optional
        punctuation_prediction_enabled = true,       # optional
        inverse_text_normalization_enabled = true,   # optional
    }
    
    [Profiles.Profile2]
    Backend = "QwenV3"
    Config = {
        dashscope_api_key = "...",                                       # required
        dashscope_websocket_url = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-asr-flash-realtime", # optional
        language = "en",                                                 # optional
        turn_detection = { threshold = 0.2, silence_duration_ms = 900 }, # optional
    }
  3. Run daemon

    systemctl --user enable dictyped --now
  4. Restart Fcitx.

    Restarting Fcitx can be complex depending on your setup. The easist way to do this is just restart your computer.

  5. Configure dictype-fcitx trigger keys using the official Fcitx configuration, under Configuration addons...

  6. Focus on your text input, then press the profile trigger key to start. Press it again to stop. You may lose focus while transcribing.

Requirements

  1. PulseAudio, or PipeWire with pulseaudio compatibility support.
  2. fcitx5.
  3. cloud accounts for respective models (currently supports two models on Alibaba Cloud).

TODOs

  • GUI configuration tool
  • local inference

Disclaimer

  • This is a personal project and is not affiliated with any cloud providers or model providers.
  • Discretion is advised when it comes to model fees and privacy concerns when using cloud models.

License

MIT License.