Skip to content

feat(goose2): voice dictation via direct-ACP pattern#8609

Open
tulsi-builder wants to merge 29 commits intomainfrom
tulsi/voice-input-rebased
Open

feat(goose2): voice dictation via direct-ACP pattern#8609
tulsi-builder wants to merge 29 commits intomainfrom
tulsi/voice-input-rebased

Conversation

@tulsi-builder
Copy link
Copy Markdown
Collaborator

@tulsi-builder tulsi-builder commented Apr 16, 2026

Summary

Wires voice dictation into goose2 following the post-#8549/#8582 architecture: frontend calls ACP custom methods directly over the goose-serve WebSocket, no Tauri middleware. Supersedes #8565.

What lands

Eight ACP custom methods on the goose server, consumed directly via the regenerated @aaif/goose-sdk GooseClient:

  • _goose/dictation/transcribe, /config
  • _goose/dictation/models/{list, download, download/progress, cancel, delete}
  • _goose/dictation/model/select

Frontend UI for local Whisper model management (ui/goose2/src/features/settings/ui/LocalWhisperModels.tsx): list + download-with-progress + cancel + delete + select, with 750ms progress polling that auto-stops when nothing is downloading. Replaces the pre-existing "Local model download is not yet available" placeholder.

Chat input improvements:

  • Mic toggle with auto-submit keyword
  • One-click send while mic is still recording (was two-click before)
  • useAudioDevices now reacts to OS-level mic permission changes via navigator.permissions.query
  • Defensive filter on empty deviceId to avoid Radix Select crashes when enumerateDevices() runs before getUserMedia

On the Rust side:

  • on_dictation_transcribe now reads per-provider selected model from config (via dictation_selected_model helper) instead of always using hardcoded defaults
  • on_dictation_model_select validates is_downloaded() for Local before persisting
  • local-inference feature forwarded from goose-cli to goose-acp so is_configured(Local) resolves correctly

Why not Tauri commands

Only OS-bound operations would justify Tauri wrappers; navigator.mediaDevices.getUserMedia handles the mic permission prompt in the renderer, so no new Tauri commands were needed.

Testing

  • cargo check -p goose-acp (default features) — passes
  • cargo test -p goose-acp -p goose-sdk — passes
  • pnpm tsc --noEmit — clean
  • pnpm vitest run — 51 files / 412 tests passing (9 new tests for SDK-routed dictation functions)
  • Manual end-to-end on macOS: local Whisper tiny model download → select → transcribe-in-chat flow works
  • OpenAI / Groq / ElevenLabs paths compile and route correctly but were not manually tested (no API keys available — teammates with keys should validate)

Related Issues

Screenshots/Demos

Screen.Recording.2026-04-16.at.1.37.38.PM.mov

🤖 Generated with Claude Code

tulsi-builder and others added 14 commits April 16, 2026 13:00
Add voice dictation support to the goose2 Tauri app by exposing
transcription and config as ACP custom methods, then wiring the
frontend to use them.

Backend (crates/):
- Add DictationTranscribeRequest/Response and DictationConfigRequest/Response
  types to goose-sdk custom_requests.rs with model metadata fields
- Add #[custom_method] handlers in goose-acp server.rs for transcribe
  (OpenAI, Groq, ElevenLabs, Local) and config
- Register methods in acp-meta.json
- Forward local-inference feature from goose-cli to goose-acp

Tauri (ui/goose2/src-tauri/):
- Rewrite dictation.rs to use call_ext_method via ACP instead of
  importing goose crate directly
- Add generic CallExt command to ACP manager with method name
  normalization (strips leading _ to avoid double-prefix)
- Register get_dictation_config and transcribe_dictation commands

Frontend (ui/goose2/src/):
- Wire useDictationRecorder + useVoiceInputPreferences into ChatInput
- Replace placeholder mic button with working toggle (recording/
  transcribing states, auto-submit on keyword)
- Stop recording on manual send and on auto-submit keyword
- Show "Listening..."/"Transcribing..." placeholder in textarea
- Add Voice section to SettingsModal with VoiceInputSettings
- Add all voice i18n strings (en + es)
- Fix pre-existing type errors in dictationVad.ts and VoiceInputSettings

Known issue: Local Whisper reports configured: false despite model being
downloaded and config set. The is_downloaded() path check needs
investigation in a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picks up DictationTranscribeRequest/Response, DictationConfigRequest/Response,
and DictationProviderStatusEntry entries. Required for the @aaif/goose-sdk
TypeScript generator in ui/sdk to see the new methods.
Replaces Tauri invoke() with client.goose.GooseDictationConfig() and
GooseDictationTranscribe() for the two ACP methods registered on the
goose server. Matches the post-8549/8582 pattern: frontend talks
directly to goose serve over WebSocket, no Tauri middleware.

The remaining seven functions in dictation.ts still call invoke() for
Tauri commands that no longer exist; those migrate to ACP methods added
in a later commit.
Adds six custom methods so the goose2 frontend can list, download,
track, cancel, delete, and select local Whisper models through the
same WebSocket channel it already uses for transcription:

  _goose/dictation/models/list
  _goose/dictation/models/download
  _goose/dictation/models/download/progress
  _goose/dictation/models/cancel
  _goose/dictation/models/delete
  _goose/dictation/model/select

All local-model operations are gated on the local-inference feature;
without it they return "Local inference not enabled". The select method
accepts any dictation provider (openai, groq, elevenlabs, local) and
writes to the appropriate config key.

Replaces the previous plan to expose these as Tauri commands -- following
the post-8549/8582 pattern of ACP-from-frontend-direct.

Signed-off-by: tulsi <tulsi@block.xyz>
Migrates seven dictation.ts functions off Tauri invoke() onto the
regenerated @aaif/goose-sdk client:
  saveDictationModelSelection, listDictationLocalModels,
  downloadDictationLocalModel, getDictationLocalModelDownloadProgress,
  cancelDictationLocalModelDownload, deleteDictationLocalModel

Leaves alone:
  saveDictationProviderSecret / deleteDictationProviderSecret — use
    generic save_provider_field / delete_provider_config Tauri commands
  getMicrophonePermissionStatus / requestMicrophonePermission — OS-bound
    browser APIs handle mic prompt in VoiceInputSettings

Each migrated function uses a type cast at the SDK boundary because the
regenerated types don't fully overlap with the hand-written local types
(e.g., WhisperModelStatus has url/recommended fields the SDK's
DictationLocalModelStatus doesn't). Consumers that read missing fields
will get undefined at runtime; end-to-end verification in a later task
will surface any breakage.

Signed-off-by: tulsi <tulsi@block.xyz>
getMicrophonePermissionStatus and requestMicrophonePermission had zero
callers after the voice-input work settled — VoiceInputSettings derives
permission status from the browser's navigator.mediaDevices.getUserMedia
directly rather than routing through Tauri. Drop the exports and the
now-unused MicrophonePermissionStatus type import.

The type itself stays defined in shared/types/dictation.ts for any
future consumer; only the Tauri-routed helpers are removed.
Replaces the "Local model download is not yet available" placeholder in
VoiceInputSettings with a working LocalWhisperModels component that
drives the six ACP methods added upstream: list, download, progress,
cancel, delete, select.

Per-row UI state machine:
  - not downloaded  -> Download button
  - downloading     -> progress bar + Cancel button (polls every 750ms)
  - downloaded + selected   -> "Selected" badge + Delete
  - downloaded + unselected -> Select + Delete

Progress polling auto-stops when no active downloads remain. Download
completion refreshes the model list and notifies the parent config so
the mic button in chat enables without a manual reload.

i18n keys added for EN and ES; obsolete localModelUnavailable key left
in place (unused now) to avoid gratuitous deletion.
…etes

The onModelsChanged callback only called refreshConfig() — it didn't
emit notifyVoiceDictationConfigChanged(). Result: after downloading a
local Whisper model, the chat page's useVoiceDictation hook kept
stale providerStatuses and left the mic button disabled until the
window was reloaded.

Symmetric with how handleModelChange already notifies on cloud-provider
model changes. Now both paths emit the same event.
ChatInput's handleSend used to early-return when isRecording or
isTranscribing, which meant clicking Send during active dictation only
stopped the mic — you had to click Send a second time to actually send.

Remove the early return. If recording is still live, stop it with
flushPending:false and send whatever's already transcribed into the
textarea. Any in-flight audio the user spoke AFTER clicking Send is
intentionally dropped — by the time the user clicks Send, what's in the
textarea is what they want to send.

Empty-send is still blocked by the canSend guard, so an accidental Send
with no transcription is a no-op.
Two fixes:

1. useAudioDevices now subscribes to navigator.permissions.query for
   'microphone' and reflects the live OS-level permission state. Before,
   hasPermission only became true when the user clicked 'Grant access'
   from this component — if they'd already granted mic permission via
   the chat input's getUserMedia call, Voice settings still showed the
   Grant access button with no effect. Now opening Voice settings shows
   the correct state immediately and updates reactively if permission
   changes elsewhere.

2. Move the Microphone block above the per-provider (API key / model)
   config block so its visual position reflects what it is: a
   voice-level setting that applies regardless of selected provider,
   not a provider-specific detail.
After routing useAudioDevices through navigator.permissions.query,
enumerateDevices() can return entries with empty-string deviceId when
the page hasn't yet exercised getUserMedia in the current session —
browsers withhold full device identifiers as a privacy measure. Radix
Select rejects empty-string values and crashes the component.

Three changes:
- When permissions.query reports 'granted' on mount, call the
  permission-ful enumeration (getUserMedia + enumerateDevices) so
  deviceIds are real from the start, not the sans-permission enumeration
  that returns empty IDs.
- Drop the 'if (loading) return' guard that made loadDevicesWithPermission
  a no-op on initial mount (loading starts true).
- Defensively filter empty deviceIds out of the Select items so stale
  enumerations can't crash the UI.
Three issues surfaced in code review:

1. shared/types/dictation.ts: trim WhisperModelStatus and
   DictationDownloadProgress to match the actual SDK response shape.
   The previous types declared fields the SDK doesn't carry (url,
   recommended, speedBps, etaSeconds, modelId); the 'as unknown as'
   casts in dictation.ts let it compile but consumers reading missing
   fields got undefined. No runtime change — just honest types.

2. goose-acp/src/server.rs: on_dictation_model_select now validates
   is_downloaded() for the Local provider before persisting. Previously
   a caller could write LOCAL_WHISPER_MODEL to a model id whose file
   isn't on disk, making is_configured(Local) return false and
   transcribe_local() fail at runtime. The UI gated this in practice,
   but the ACP method is a public interface.

3. i18n: remove dead keys (localModelUnavailable, recommended,
   providerSetupHint, active) from both en and es locales. Zero
   consumers after the LocalWhisperModels component landed.
Fixes CI Rust Code Format and Goose 2 Lint & Format checks. Whitespace
and wrap changes only — no behavioral diffs.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42a3f753af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceDictation.ts Outdated
Comment thread crates/goose-acp/src/server.rs Outdated
Comment thread ui/goose2/src/shared/ui/ai-elements/mic-selector.tsx
tulsi-builder and others added 2 commits April 16, 2026 13:24
Four targeted fixes addressing Codex review comments (P1/P2) and a CI
clippy failure on PR #8609.

1. useVoiceDictation: fix stale-text race in handleTranscription
   Multiple dictation callbacks firing in the same tick all read `text`
   from closure, so the second callback computes `merged` from stale
   state and overwrites the first fragment — dropping dictated words in
   longer recordings. Mirror `text` in a ref that is updated both via
   effect (on React commit) and synchronously after each setText call
   in the callback, so subsequent same-tick callbacks always see the
   latest value. Applies to both the auto-submit and append branches.

2. server.rs dictation model download callback: preserve user selection
   The post-download callback unconditionally wrote the freshly
   downloaded model id to LOCAL_WHISPER_MODEL_CONFIG_KEY, silently
   switching the active model mid-session if the user already had one
   selected. Only auto-select when no valid model is currently set,
   preserving the happy path (first download still auto-selects) while
   protecting against surprise switches on subsequent downloads.

3. mic-selector: re-enumerate devices when permission flips to granted
   When OS-level mic permission is granted while the app is open (e.g.
   via System Settings), the permissions-API change handler only
   flipped `hasPermission` — the device list still reflected the
   pre-permission enumeration, which on WKWebView typically has empty
   deviceId/label entries (filtered out by VoiceInputSettings). The
   user was stuck with an empty select and no "Grant access" button.
   Trigger loadDevicesWithPermission() when state transitions to
   granted so real device metadata is picked up immediately.

4. server.rs:2796 clippy needless_borrow
   `config` is already `&'static Config` from Config::global(); drop
   the redundant `&` to pass CI's clippy -D warnings gate.

Verified:
- cargo clippy -p goose-acp --all-targets -- -D warnings: clean
- cargo test -p goose-acp -p goose-sdk: 48 passed
- pnpm tsc --noEmit (ui/goose2): clean
- pnpm vitest run (ui/goose2): 412 passed across 51 files

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- cargo fmt applies a single-line collapse on the LOCAL_WHISPER_MODEL
  set_param error log in on_dictation_model_download. CI Check Rust
  Code Format was flagging the multi-line form.

- ui/sdk/dist was committed by accident during the two SDK regeneration
  passes; the directory is .gitignored on main (ui/sdk/.gitignore
  excludes dist/ and node_modules/). Untrack it here so the PR doesn't
  carry ~2500 lines of build artifacts.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1a40d4a6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useDictationRecorder.ts
Comment thread ui/goose2/src/features/chat/lib/voiceInput.ts Outdated
Addresses three issues on the voice-input PR:

1. Prevent overlapping recording startups. A rapid double-click on the mic
   could kick off a parallel startRecording while getUserMedia was still
   pending, leaking a MediaStream and leaving the OS mic indicator on.
   Add startingRef as a synchronous guard at the top of startRecording,
   and skip the toggle path while a startup is in-flight.

2. Fix the "Send leaves the mic on" bug. If the user clicked Send mid-
   startup (isRecording still false), handleSend would skip stopRecording
   and the getUserMedia that landed afterward would leave the mic hot.
   stopRecording now sets cancelStartRef; the startup path checks it at
   both await points and tears down the freshly-acquired stream instead
   of flipping isRecording to true. handleSend also calls stopRecording
   when isStarting() is true so a pending startup is cancelled.

3. Fix auto-submit phrase removal when the raw text has repeated
   whitespace. getAutoSubmitMatch matched against the normalized text
   but used -phrase.length against the raw text, chopping legitimate
   content (e.g. "hello ship   it" with phrase "ship it" produced
   "hello sh"). Match the phrase at the end of the raw text via regex
   so the slice index reflects the real phrase span. Adds a regression
   test covering the repeated-whitespace case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4739e05bbc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceDictation.ts Outdated
Comment thread ui/goose2/src/features/chat/hooks/useVoiceDictation.ts Outdated
Two issues from Codex PR review on useVoiceDictation.ts:

1. textRef was synced to `text` via a post-render useEffect, leaving a
   commit window where a transcription callback could read the previous
   value and clobber a character the user just typed into the textarea.
   Assign `textRef.current = text` during render instead (React
   explicitly permits this; see `providerRef.current = provider` in
   useDictationRecorder.ts). The synchronous `textRef.current = merged`
   follow-up after setText is kept — it still guards the
   callback-vs-callback race.

2. `activeVoiceProvider` held onto a stored preference even when that
   provider was no longer present in `providerStatuses` (feature-flagged
   off, removed from allowlist, etc.), silently disabling voice input
   even though another provider was configured. Now treat the stored
   preference as valid only when it appears in providerStatuses;
   otherwise fall back to `getDefaultDictationProvider`. The explicit
   "off" state (hasStoredProviderPreference && selectedProvider == null)
   is preserved.
Adds clearSelectedProvider() to useVoiceInputPreferences that removes the
provider preference from localStorage entirely — distinct from
setSelectedProvider(null), which pins the user to 'voice off' via a
sentinel value.

useVoiceDictation now calls clearSelectedProvider() via useEffect when it
detects the stored preference points at a provider that's no longer in
providerStatuses (feature-flagged off, removed from the allowlist, etc.).
The fall-through to getDefaultDictationProvider still works immediately
this session; the clear makes it stick so the user doesn't keep hitting
the stale-preference detection on every boot.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cec12c1da

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/goose-acp/src/server.rs Outdated
Comment thread ui/goose2/src/features/chat/ui/ChatInputToolbar.tsx Outdated
1. clippy needless_return — CI's cargo clippy --workspace with
   local-inference active was flagging 5 'return Ok(...)' tail
   statements inside #[cfg(feature = "local-inference")] blocks in
   the new dictation model-management handlers. Replace with tail
   expressions. Blocked merge.

2. P2 — deleted model treated as active selection. The download
   completion callback's already_selected guard used
   whisper::get_model(id).is_some(), which checks metadata, not
   download state. After deleting the selected model file, the config
   key still points at it and the next download is skipped for
   auto-select. Tighten the filter to is_some_and(|m| m.is_downloaded())
   so a deleted-but-configured model is treated as no selection.

3. P2 — mic toggle disabled mid-recording. The mic button used
   'disabled={!voiceEnabled || disabled}', so if the outer 'disabled'
   prop flipped true mid-session the user lost any UI way to stop an
   active recording (Send was already blocked by canSend). Keep the
   button clickable while voiceRecording is true.
@tulsi-builder tulsi-builder enabled auto-merge April 16, 2026 21:43
Copy link
Copy Markdown
Collaborator

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks pretty good to me! Made one minor comment and @jamadeo may want to have a look as well to confirm the custom type defs fit into the architecture we've planned nicely.

Eventually we'll put all this knowledge into the AGENTS.md and it will be automatic but good to review more closely early on

default = ["code-mode", "local-inference", "aws-providers", "telemetry", "otel", "rustls-tls"]
code-mode = ["goose/code-mode", "goose-acp/code-mode"]
local-inference = ["goose/local-inference"]
local-inference = ["goose/local-inference", "goose-acp/local-inference"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit lost what the modifications to the local-inference feature defs are up to here! Do you know of a purpose? I'd guess you can remove this and the similar line in crates/goose-acp/Cargo.toml if not

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's load-bearing. crates/goose-acp/src/server.rs has #[cfg(feature = "local-inference")] blocks for the Local dictation transcribe path and all six local-model handlers; without the forward, goose-cli's default build activates goose/local-inference but not goose-acp/local-inference, those blocks are compiled out, and is_configured(Local) always returns false — the original bug. Removing either line silently breaks Local Whisper in the default binary.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will make more sense if (when?) we merge the goose-acp crate into the goose crate. Given where we're going with this, I don't think it makes sense that they remain separated

Copy link
Copy Markdown
Member

@jamadeo jamadeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks overall good, but for the most part we shouldn't use localStorage for settings and instead prefer the goose config. There may be some exceptions to this if the setting is truly relevant only to this app, but I think all the ones here would be better off in the user's config file

} from "../lib/voiceInput";
import type { DictationProvider } from "@/shared/types/dictation";

const VOICE_INPUT_PREFERENCES_EVENT = "goose:voice-input-preferences";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using values from the goose config, not local storage, for these settings

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a4c78c58f3. useVoiceInputPreferences now reads/writes via _goose/config/{read,upsert,remove} over ACP. Three keys: VOICE_DICTATION_PROVIDER, VOICE_DICTATION_PREFERRED_MIC, VOICE_AUTO_SUBMIT_PHRASES.

waiting for checks

…torage

Per PR review from @jamadeo: app settings should live in the user's
goose config.yaml, not localStorage.

useVoiceInputPreferences now uses the _goose/config/{read,upsert,remove}
ACP methods for all three voice settings:
  VOICE_DICTATION_PROVIDER
  VOICE_DICTATION_PREFERRED_MIC
  VOICE_AUTO_SUBMIT_PHRASES

Config keys renamed from localStorage-style (goose:voice-*) to
uppercase-snake (matches rest of goose config.yaml conventions).
Cross-instance sync preserved via the existing window event so
VoiceInputSettings writes propagate to useVoiceDictation's reads without
requiring a remount.

Known tradeoff: the hook is now async on mount. Initial render sees
defaults (selectedProvider=null, rawAutoSubmitPhrases="submit") until
the ACP round-trip lands, typically <50ms on a local WebSocket. No new
loading state exposed; VoiceInputSettings' own loading state (based on
getDictationConfig) covers the user-visible window.

Users with existing localStorage values will see a one-time reset; the
feature is new enough that migration isn't worth the complexity.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4c78c58f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceDictation.ts Outdated
Comment thread ui/goose2/src/features/settings/ui/VoiceInputSettings.tsx Outdated
Formatter wanted the two export consts and the setSelectedProvider
callback collapsed onto single lines. pnpm check was failing on Lint &
Format in CI; pnpm format --write applied these. No behavior change.
Two Codex review issues on #8609:

1. P1 — auto-submit bypassed send guards. useVoiceDictation's
   handleTranscription called onSend directly when a trigger phrase
   matched, bypassing ChatInput's canSend / hasQueuedMessage /
   disabled checks. A dictation phrase could dispatch a message while
   another was already queued or while input was otherwise blocked.
   Add an isSendLocked prop to the hook; when true, the trigger phrase
   is stripped and the remaining transcription is left in the textarea
   for the user to review and send manually. ChatInput passes
   hasQueuedMessage || disabled, matching its own send path.

2. P2 — VoiceInputSettings.refreshConfig persisted the disabled
   sentinel when the stored provider disappeared from the fetched
   config. That turned an invalid/stale preference into a durable
   "voice off" opt-out, so the user stayed disabled across sessions
   even after valid providers reappeared. Use the new
   clearSelectedProvider() to remove the key outright, matching the
   self-heal that useVoiceDictation already does.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 90b47d6f1f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useDictationRecorder.ts
Comment thread ui/goose2/src/features/chat/hooks/useVoiceInputPreferences.ts Outdated
Two Codex review issues on #8609:

1. P1 — transcription chunks could emit out of order. useDictationRecorder
   fired multiple transcribeChunk calls concurrently as samples crossed
   VAD boundaries; if a later chunk's API call resolved faster than an
   earlier one, onTranscription would append them in the wrong order,
   scrambling long dictation sessions with variable API latency. Assign
   a per-generation monotonic seq number at enqueue, buffer results in a
   Map, drain contiguous prefix to onTranscription. Empty transcriptions
   still occupy a slot so they don't stall later chunks, and errors
   unblock the queue the same way. generationRef += 1 now also resets
   the sequence state so in-flight old-gen chunks can bail at the gen
   check without leaving a gap.

2. P2 — unknown stored provider value was being persisted as voice-off.
   useVoiceInputPreferences.syncFromConfig set hasStoredProviderPreference
   = (providerValue !== null), which was also true for unrecognized
   strings (stale config from older builds, typos). Combined with
   normalizeDictationProvider returning null, downstream code interpreted
   this as an explicit "voice off" opt-out, leaving voice disabled until
   the user manually re-selected. Only mark the preference as present when
   the value is recognized (or the explicit disabled sentinel); otherwise
   clear the config key so future boots fall through to default cleanly.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf368986f5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/settings/ui/VoiceInputSettings.tsx
useVoiceInputPreferences loads the stored provider asynchronously via
ACP. On mount, hasStoredProviderPreference defaults to false until the
config round-trip lands, so VoiceInputSettings.refreshConfig — which
runs simultaneously — closed over false and called
setSelectedProvider(getDefaultDictationProvider(...)) before the real
value arrived, clobbering the user's saved choice (including their
explicit disable).

Add an isHydrated flag to the prefs hook that flips true after the
first syncFromConfig completes. VoiceInputSettings.refreshConfig now
bails the auto-select path until isHydrated is true. The refreshConfig
useCallback lists voicePrefsHydrated as a dep, so when hydration
completes the mount useEffect re-fires refreshConfig with trustworthy
state.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf0c21ff04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/ui/ChatInput.tsx
Comment thread ui/goose2/src/features/settings/ui/LocalWhisperModels.tsx
Signed-off-by: tulsi <tulsi@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 846195a227

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceInputPreferences.ts Outdated
Signed-off-by: tulsi <tulsi@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8660aa1ecf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceInputPreferences.ts Outdated
Comment thread ui/goose2/src/features/chat/ui/ChatInputToolbar.tsx Outdated
Signed-off-by: tulsi <tulsi@block.xyz>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 583a2189f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread ui/goose2/src/features/chat/hooks/useVoiceDictation.ts
@tulsi-builder
Copy link
Copy Markdown
Collaborator Author

@copilot resolve the merge conflicts in this pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants