feat(goose2): voice dictation via direct-ACP pattern#8609
feat(goose2): voice dictation via direct-ACP pattern#8609tulsi-builder wants to merge 29 commits intomainfrom
Conversation
Add voice dictation support to the goose2 Tauri app by exposing transcription and config as ACP custom methods, then wiring the frontend to use them. Backend (crates/): - Add DictationTranscribeRequest/Response and DictationConfigRequest/Response types to goose-sdk custom_requests.rs with model metadata fields - Add #[custom_method] handlers in goose-acp server.rs for transcribe (OpenAI, Groq, ElevenLabs, Local) and config - Register methods in acp-meta.json - Forward local-inference feature from goose-cli to goose-acp Tauri (ui/goose2/src-tauri/): - Rewrite dictation.rs to use call_ext_method via ACP instead of importing goose crate directly - Add generic CallExt command to ACP manager with method name normalization (strips leading _ to avoid double-prefix) - Register get_dictation_config and transcribe_dictation commands Frontend (ui/goose2/src/): - Wire useDictationRecorder + useVoiceInputPreferences into ChatInput - Replace placeholder mic button with working toggle (recording/ transcribing states, auto-submit on keyword) - Stop recording on manual send and on auto-submit keyword - Show "Listening..."/"Transcribing..." placeholder in textarea - Add Voice section to SettingsModal with VoiceInputSettings - Add all voice i18n strings (en + es) - Fix pre-existing type errors in dictationVad.ts and VoiceInputSettings Known issue: Local Whisper reports configured: false despite model being downloaded and config set. The is_downloaded() path check needs investigation in a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picks up DictationTranscribeRequest/Response, DictationConfigRequest/Response, and DictationProviderStatusEntry entries. Required for the @aaif/goose-sdk TypeScript generator in ui/sdk to see the new methods.
Replaces Tauri invoke() with client.goose.GooseDictationConfig() and GooseDictationTranscribe() for the two ACP methods registered on the goose server. Matches the post-8549/8582 pattern: frontend talks directly to goose serve over WebSocket, no Tauri middleware. The remaining seven functions in dictation.ts still call invoke() for Tauri commands that no longer exist; those migrate to ACP methods added in a later commit.
Adds six custom methods so the goose2 frontend can list, download, track, cancel, delete, and select local Whisper models through the same WebSocket channel it already uses for transcription: _goose/dictation/models/list _goose/dictation/models/download _goose/dictation/models/download/progress _goose/dictation/models/cancel _goose/dictation/models/delete _goose/dictation/model/select All local-model operations are gated on the local-inference feature; without it they return "Local inference not enabled". The select method accepts any dictation provider (openai, groq, elevenlabs, local) and writes to the appropriate config key. Replaces the previous plan to expose these as Tauri commands -- following the post-8549/8582 pattern of ACP-from-frontend-direct. Signed-off-by: tulsi <tulsi@block.xyz>
Signed-off-by: tulsi <tulsi@block.xyz>
Migrates seven dictation.ts functions off Tauri invoke() onto the
regenerated @aaif/goose-sdk client:
saveDictationModelSelection, listDictationLocalModels,
downloadDictationLocalModel, getDictationLocalModelDownloadProgress,
cancelDictationLocalModelDownload, deleteDictationLocalModel
Leaves alone:
saveDictationProviderSecret / deleteDictationProviderSecret — use
generic save_provider_field / delete_provider_config Tauri commands
getMicrophonePermissionStatus / requestMicrophonePermission — OS-bound
browser APIs handle mic prompt in VoiceInputSettings
Each migrated function uses a type cast at the SDK boundary because the
regenerated types don't fully overlap with the hand-written local types
(e.g., WhisperModelStatus has url/recommended fields the SDK's
DictationLocalModelStatus doesn't). Consumers that read missing fields
will get undefined at runtime; end-to-end verification in a later task
will surface any breakage.
Signed-off-by: tulsi <tulsi@block.xyz>
getMicrophonePermissionStatus and requestMicrophonePermission had zero callers after the voice-input work settled — VoiceInputSettings derives permission status from the browser's navigator.mediaDevices.getUserMedia directly rather than routing through Tauri. Drop the exports and the now-unused MicrophonePermissionStatus type import. The type itself stays defined in shared/types/dictation.ts for any future consumer; only the Tauri-routed helpers are removed.
Replaces the "Local model download is not yet available" placeholder in VoiceInputSettings with a working LocalWhisperModels component that drives the six ACP methods added upstream: list, download, progress, cancel, delete, select. Per-row UI state machine: - not downloaded -> Download button - downloading -> progress bar + Cancel button (polls every 750ms) - downloaded + selected -> "Selected" badge + Delete - downloaded + unselected -> Select + Delete Progress polling auto-stops when no active downloads remain. Download completion refreshes the model list and notifies the parent config so the mic button in chat enables without a manual reload. i18n keys added for EN and ES; obsolete localModelUnavailable key left in place (unused now) to avoid gratuitous deletion.
…etes The onModelsChanged callback only called refreshConfig() — it didn't emit notifyVoiceDictationConfigChanged(). Result: after downloading a local Whisper model, the chat page's useVoiceDictation hook kept stale providerStatuses and left the mic button disabled until the window was reloaded. Symmetric with how handleModelChange already notifies on cloud-provider model changes. Now both paths emit the same event.
ChatInput's handleSend used to early-return when isRecording or isTranscribing, which meant clicking Send during active dictation only stopped the mic — you had to click Send a second time to actually send. Remove the early return. If recording is still live, stop it with flushPending:false and send whatever's already transcribed into the textarea. Any in-flight audio the user spoke AFTER clicking Send is intentionally dropped — by the time the user clicks Send, what's in the textarea is what they want to send. Empty-send is still blocked by the canSend guard, so an accidental Send with no transcription is a no-op.
Two fixes: 1. useAudioDevices now subscribes to navigator.permissions.query for 'microphone' and reflects the live OS-level permission state. Before, hasPermission only became true when the user clicked 'Grant access' from this component — if they'd already granted mic permission via the chat input's getUserMedia call, Voice settings still showed the Grant access button with no effect. Now opening Voice settings shows the correct state immediately and updates reactively if permission changes elsewhere. 2. Move the Microphone block above the per-provider (API key / model) config block so its visual position reflects what it is: a voice-level setting that applies regardless of selected provider, not a provider-specific detail.
After routing useAudioDevices through navigator.permissions.query, enumerateDevices() can return entries with empty-string deviceId when the page hasn't yet exercised getUserMedia in the current session — browsers withhold full device identifiers as a privacy measure. Radix Select rejects empty-string values and crashes the component. Three changes: - When permissions.query reports 'granted' on mount, call the permission-ful enumeration (getUserMedia + enumerateDevices) so deviceIds are real from the start, not the sans-permission enumeration that returns empty IDs. - Drop the 'if (loading) return' guard that made loadDevicesWithPermission a no-op on initial mount (loading starts true). - Defensively filter empty deviceIds out of the Select items so stale enumerations can't crash the UI.
Three issues surfaced in code review: 1. shared/types/dictation.ts: trim WhisperModelStatus and DictationDownloadProgress to match the actual SDK response shape. The previous types declared fields the SDK doesn't carry (url, recommended, speedBps, etaSeconds, modelId); the 'as unknown as' casts in dictation.ts let it compile but consumers reading missing fields got undefined. No runtime change — just honest types. 2. goose-acp/src/server.rs: on_dictation_model_select now validates is_downloaded() for the Local provider before persisting. Previously a caller could write LOCAL_WHISPER_MODEL to a model id whose file isn't on disk, making is_configured(Local) return false and transcribe_local() fail at runtime. The UI gated this in practice, but the ACP method is a public interface. 3. i18n: remove dead keys (localModelUnavailable, recommended, providerSetupHint, active) from both en and es locales. Zero consumers after the LocalWhisperModels component landed.
Fixes CI Rust Code Format and Goose 2 Lint & Format checks. Whitespace and wrap changes only — no behavioral diffs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42a3f753af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Four targeted fixes addressing Codex review comments (P1/P2) and a CI clippy failure on PR #8609. 1. useVoiceDictation: fix stale-text race in handleTranscription Multiple dictation callbacks firing in the same tick all read `text` from closure, so the second callback computes `merged` from stale state and overwrites the first fragment — dropping dictated words in longer recordings. Mirror `text` in a ref that is updated both via effect (on React commit) and synchronously after each setText call in the callback, so subsequent same-tick callbacks always see the latest value. Applies to both the auto-submit and append branches. 2. server.rs dictation model download callback: preserve user selection The post-download callback unconditionally wrote the freshly downloaded model id to LOCAL_WHISPER_MODEL_CONFIG_KEY, silently switching the active model mid-session if the user already had one selected. Only auto-select when no valid model is currently set, preserving the happy path (first download still auto-selects) while protecting against surprise switches on subsequent downloads. 3. mic-selector: re-enumerate devices when permission flips to granted When OS-level mic permission is granted while the app is open (e.g. via System Settings), the permissions-API change handler only flipped `hasPermission` — the device list still reflected the pre-permission enumeration, which on WKWebView typically has empty deviceId/label entries (filtered out by VoiceInputSettings). The user was stuck with an empty select and no "Grant access" button. Trigger loadDevicesWithPermission() when state transitions to granted so real device metadata is picked up immediately. 4. server.rs:2796 clippy needless_borrow `config` is already `&'static Config` from Config::global(); drop the redundant `&` to pass CI's clippy -D warnings gate. Verified: - cargo clippy -p goose-acp --all-targets -- -D warnings: clean - cargo test -p goose-acp -p goose-sdk: 48 passed - pnpm tsc --noEmit (ui/goose2): clean - pnpm vitest run (ui/goose2): 412 passed across 51 files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cargo fmt applies a single-line collapse on the LOCAL_WHISPER_MODEL set_param error log in on_dictation_model_download. CI Check Rust Code Format was flagging the multi-line form. - ui/sdk/dist was committed by accident during the two SDK regeneration passes; the directory is .gitignored on main (ui/sdk/.gitignore excludes dist/ and node_modules/). Untrack it here so the PR doesn't carry ~2500 lines of build artifacts.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e1a40d4a6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Addresses three issues on the voice-input PR: 1. Prevent overlapping recording startups. A rapid double-click on the mic could kick off a parallel startRecording while getUserMedia was still pending, leaking a MediaStream and leaving the OS mic indicator on. Add startingRef as a synchronous guard at the top of startRecording, and skip the toggle path while a startup is in-flight. 2. Fix the "Send leaves the mic on" bug. If the user clicked Send mid- startup (isRecording still false), handleSend would skip stopRecording and the getUserMedia that landed afterward would leave the mic hot. stopRecording now sets cancelStartRef; the startup path checks it at both await points and tears down the freshly-acquired stream instead of flipping isRecording to true. handleSend also calls stopRecording when isStarting() is true so a pending startup is cancelled. 3. Fix auto-submit phrase removal when the raw text has repeated whitespace. getAutoSubmitMatch matched against the normalized text but used -phrase.length against the raw text, chopping legitimate content (e.g. "hello ship it" with phrase "ship it" produced "hello sh"). Match the phrase at the end of the raw text via regex so the slice index reflects the real phrase span. Adds a regression test covering the repeated-whitespace case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4739e05bbc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two issues from Codex PR review on useVoiceDictation.ts: 1. textRef was synced to `text` via a post-render useEffect, leaving a commit window where a transcription callback could read the previous value and clobber a character the user just typed into the textarea. Assign `textRef.current = text` during render instead (React explicitly permits this; see `providerRef.current = provider` in useDictationRecorder.ts). The synchronous `textRef.current = merged` follow-up after setText is kept — it still guards the callback-vs-callback race. 2. `activeVoiceProvider` held onto a stored preference even when that provider was no longer present in `providerStatuses` (feature-flagged off, removed from allowlist, etc.), silently disabling voice input even though another provider was configured. Now treat the stored preference as valid only when it appears in providerStatuses; otherwise fall back to `getDefaultDictationProvider`. The explicit "off" state (hasStoredProviderPreference && selectedProvider == null) is preserved.
Adds clearSelectedProvider() to useVoiceInputPreferences that removes the provider preference from localStorage entirely — distinct from setSelectedProvider(null), which pins the user to 'voice off' via a sentinel value. useVoiceDictation now calls clearSelectedProvider() via useEffect when it detects the stored preference points at a provider that's no longer in providerStatuses (feature-flagged off, removed from the allowlist, etc.). The fall-through to getDefaultDictationProvider still works immediately this session; the clear makes it stick so the user doesn't keep hitting the stale-preference detection on every boot.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cec12c1da
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
1. clippy needless_return — CI's cargo clippy --workspace with
local-inference active was flagging 5 'return Ok(...)' tail
statements inside #[cfg(feature = "local-inference")] blocks in
the new dictation model-management handlers. Replace with tail
expressions. Blocked merge.
2. P2 — deleted model treated as active selection. The download
completion callback's already_selected guard used
whisper::get_model(id).is_some(), which checks metadata, not
download state. After deleting the selected model file, the config
key still points at it and the next download is skipped for
auto-select. Tighten the filter to is_some_and(|m| m.is_downloaded())
so a deleted-but-configured model is treated as no selection.
3. P2 — mic toggle disabled mid-recording. The mic button used
'disabled={!voiceEnabled || disabled}', so if the outer 'disabled'
prop flipped true mid-session the user lost any UI way to stop an
active recording (Send was already blocked by canSend). Keep the
button clickable while voiceRecording is true.
alexhancock
left a comment
There was a problem hiding this comment.
It looks pretty good to me! Made one minor comment and @jamadeo may want to have a look as well to confirm the custom type defs fit into the architecture we've planned nicely.
Eventually we'll put all this knowledge into the AGENTS.md and it will be automatic but good to review more closely early on
| default = ["code-mode", "local-inference", "aws-providers", "telemetry", "otel", "rustls-tls"] | ||
| code-mode = ["goose/code-mode", "goose-acp/code-mode"] | ||
| local-inference = ["goose/local-inference"] | ||
| local-inference = ["goose/local-inference", "goose-acp/local-inference"] |
There was a problem hiding this comment.
I am a bit lost what the modifications to the local-inference feature defs are up to here! Do you know of a purpose? I'd guess you can remove this and the similar line in crates/goose-acp/Cargo.toml if not
There was a problem hiding this comment.
It's load-bearing. crates/goose-acp/src/server.rs has #[cfg(feature = "local-inference")] blocks for the Local dictation transcribe path and all six local-model handlers; without the forward, goose-cli's default build activates goose/local-inference but not goose-acp/local-inference, those blocks are compiled out, and is_configured(Local) always returns false — the original bug. Removing either line silently breaks Local Whisper in the default binary.
There was a problem hiding this comment.
I think this will make more sense if (when?) we merge the goose-acp crate into the goose crate. Given where we're going with this, I don't think it makes sense that they remain separated
jamadeo
left a comment
There was a problem hiding this comment.
Looks overall good, but for the most part we shouldn't use localStorage for settings and instead prefer the goose config. There may be some exceptions to this if the setting is truly relevant only to this app, but I think all the ones here would be better off in the user's config file
| } from "../lib/voiceInput"; | ||
| import type { DictationProvider } from "@/shared/types/dictation"; | ||
|
|
||
| const VOICE_INPUT_PREFERENCES_EVENT = "goose:voice-input-preferences"; |
There was a problem hiding this comment.
We should be using values from the goose config, not local storage, for these settings
There was a problem hiding this comment.
Done in a4c78c58f3. useVoiceInputPreferences now reads/writes via _goose/config/{read,upsert,remove} over ACP. Three keys: VOICE_DICTATION_PROVIDER, VOICE_DICTATION_PREFERRED_MIC, VOICE_AUTO_SUBMIT_PHRASES.
waiting for checks
…torage Per PR review from @jamadeo: app settings should live in the user's goose config.yaml, not localStorage. useVoiceInputPreferences now uses the _goose/config/{read,upsert,remove} ACP methods for all three voice settings: VOICE_DICTATION_PROVIDER VOICE_DICTATION_PREFERRED_MIC VOICE_AUTO_SUBMIT_PHRASES Config keys renamed from localStorage-style (goose:voice-*) to uppercase-snake (matches rest of goose config.yaml conventions). Cross-instance sync preserved via the existing window event so VoiceInputSettings writes propagate to useVoiceDictation's reads without requiring a remount. Known tradeoff: the hook is now async on mount. Initial render sees defaults (selectedProvider=null, rawAutoSubmitPhrases="submit") until the ACP round-trip lands, typically <50ms on a local WebSocket. No new loading state exposed; VoiceInputSettings' own loading state (based on getDictationConfig) covers the user-visible window. Users with existing localStorage values will see a one-time reset; the feature is new enough that migration isn't worth the complexity.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4c78c58f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Formatter wanted the two export consts and the setSelectedProvider callback collapsed onto single lines. pnpm check was failing on Lint & Format in CI; pnpm format --write applied these. No behavior change.
Two Codex review issues on #8609: 1. P1 — auto-submit bypassed send guards. useVoiceDictation's handleTranscription called onSend directly when a trigger phrase matched, bypassing ChatInput's canSend / hasQueuedMessage / disabled checks. A dictation phrase could dispatch a message while another was already queued or while input was otherwise blocked. Add an isSendLocked prop to the hook; when true, the trigger phrase is stripped and the remaining transcription is left in the textarea for the user to review and send manually. ChatInput passes hasQueuedMessage || disabled, matching its own send path. 2. P2 — VoiceInputSettings.refreshConfig persisted the disabled sentinel when the stored provider disappeared from the fetched config. That turned an invalid/stale preference into a durable "voice off" opt-out, so the user stayed disabled across sessions even after valid providers reappeared. Use the new clearSelectedProvider() to remove the key outright, matching the self-heal that useVoiceDictation already does.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 90b47d6f1f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two Codex review issues on #8609: 1. P1 — transcription chunks could emit out of order. useDictationRecorder fired multiple transcribeChunk calls concurrently as samples crossed VAD boundaries; if a later chunk's API call resolved faster than an earlier one, onTranscription would append them in the wrong order, scrambling long dictation sessions with variable API latency. Assign a per-generation monotonic seq number at enqueue, buffer results in a Map, drain contiguous prefix to onTranscription. Empty transcriptions still occupy a slot so they don't stall later chunks, and errors unblock the queue the same way. generationRef += 1 now also resets the sequence state so in-flight old-gen chunks can bail at the gen check without leaving a gap. 2. P2 — unknown stored provider value was being persisted as voice-off. useVoiceInputPreferences.syncFromConfig set hasStoredProviderPreference = (providerValue !== null), which was also true for unrecognized strings (stale config from older builds, typos). Combined with normalizeDictationProvider returning null, downstream code interpreted this as an explicit "voice off" opt-out, leaving voice disabled until the user manually re-selected. Only mark the preference as present when the value is recognized (or the explicit disabled sentinel); otherwise clear the config key so future boots fall through to default cleanly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf368986f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
useVoiceInputPreferences loads the stored provider asynchronously via ACP. On mount, hasStoredProviderPreference defaults to false until the config round-trip lands, so VoiceInputSettings.refreshConfig — which runs simultaneously — closed over false and called setSelectedProvider(getDefaultDictationProvider(...)) before the real value arrived, clobbering the user's saved choice (including their explicit disable). Add an isHydrated flag to the prefs hook that flips true after the first syncFromConfig completes. VoiceInputSettings.refreshConfig now bails the auto-select path until isHydrated is true. The refreshConfig useCallback lists voicePrefsHydrated as a dep, so when hydration completes the mount useEffect re-fires refreshConfig with trustworthy state.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cf0c21ff04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: tulsi <tulsi@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 846195a227
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: tulsi <tulsi@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8660aa1ecf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Signed-off-by: tulsi <tulsi@block.xyz>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 583a2189f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@copilot resolve the merge conflicts in this pull request |
Summary
Wires voice dictation into goose2 following the post-#8549/#8582 architecture: frontend calls ACP custom methods directly over the goose-serve WebSocket, no Tauri middleware. Supersedes #8565.
What lands
Eight ACP custom methods on the goose server, consumed directly via the regenerated
@aaif/goose-sdkGooseClient:_goose/dictation/transcribe,/config_goose/dictation/models/{list, download, download/progress, cancel, delete}_goose/dictation/model/selectFrontend UI for local Whisper model management (
ui/goose2/src/features/settings/ui/LocalWhisperModels.tsx): list + download-with-progress + cancel + delete + select, with 750ms progress polling that auto-stops when nothing is downloading. Replaces the pre-existing "Local model download is not yet available" placeholder.Chat input improvements:
useAudioDevicesnow reacts to OS-level mic permission changes vianavigator.permissions.querydeviceIdto avoid Radix Select crashes whenenumerateDevices()runs beforegetUserMediaOn the Rust side:
on_dictation_transcribenow reads per-provider selected model from config (viadictation_selected_modelhelper) instead of always using hardcoded defaultson_dictation_model_selectvalidatesis_downloaded()for Local before persistinglocal-inferencefeature forwarded fromgoose-clitogoose-acpsois_configured(Local)resolves correctlyWhy not Tauri commands
Only OS-bound operations would justify Tauri wrappers;
navigator.mediaDevices.getUserMediahandles the mic permission prompt in the renderer, so no new Tauri commands were needed.Testing
cargo check -p goose-acp(default features) — passescargo test -p goose-acp -p goose-sdk— passespnpm tsc --noEmit— cleanpnpm vitest run— 51 files / 412 tests passing (9 new tests for SDK-routed dictation functions)tinymodel download → select → transcribe-in-chat flow worksRelated Issues
Screenshots/Demos
Screen.Recording.2026-04-16.at.1.37.38.PM.mov
🤖 Generated with Claude Code