-
Notifications
You must be signed in to change notification settings - Fork 136
Separate VAD and captions. #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Useful for more audio visualizations.
WalkthroughAdds a speaking track and VAD pipeline (Silero) with worklet/worker wiring, replaces VAD transcription with a Whisper-based captions worker, introduces BoolProducer/BoolConsumer primitives, adjusts catalog exports and schemas, and updates publish/watch element wiring and UI rendering. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant PublishAudio as Publish.Audio
participant Speak as Speaking
participant Cap as Captions
participant Worklet as AudioWorklet("capture")
participant SpeakW as Worker("speaking-worker")
participant WhispW as Worker("captions-worker")
participant Broadcast
User->>PublishAudio: new Audio({ captions?, speaking? })
PublishAudio->>Speak: construct (enabled)
PublishAudio->>Cap: construct (enabled, transcribe?, ttl)
Speak->>Worklet: create "capture" node, connect MediaStream
Speak->>SpeakW: post { type: "init", port }
Worklet-->>SpeakW: AudioFrame(channels[0]) (stream)
SpeakW-->>Speak: { type: "speaking", speaking }
Speak->>Broadcast: produce speaking frames (BoolProducer)
Cap->>WhispW: post { type: "speaking", speaking } (when transcribe enabled)
Worklet-->>WhispW: AudioFrame(ch0) (stream)
WhispW-->>Cap: { type: "text", text } or { type: "error" }
Cap->>Broadcast: update captions.text (cleared after TTL)
PublishAudio->>Broadcast: publish Catalog.Audio (track + config + captions + speaking)
sequenceDiagram
autonumber
participant WatchAudio as Watch.Audio
participant WatchSpeak as Watch.Speaking
participant UI as Watch Element
participant Broadcast
WatchAudio->>WatchSpeak: new Speaking({ enabled })
WatchSpeak->>Broadcast: subscribe(info.speaking.track)
loop real-time
Broadcast-->>WatchSpeak: frame (group open/close)
WatchSpeak-->>UI: update active getter
UI->>UI: render caption text + speaking icon
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
🧰 Additional context used📓 Path-based instructions (2){rs,js}/hang/**📄 CodeRabbit inference engine (CLAUDE.md)
Files:
js/**📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Claude finished @kixelated's task —— View job Code Review Complete 👍I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings: ✅ Strengths
🔍 Minor Observations (not issues)
🏗️ ArchitectureThe separation creates a cleaner architecture where:
Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
js/hang/src/publish/audio/index.ts (1)
278-283: Include Speaking in close() to avoid worker/subscription leaksSpeaking owns a worker and a track; it should be closed with the rest.
Apply:
close() { this.#signals.close(); this.captions.close(); + this.speaking.close(); this.#track.close(); }js/hang/src/publish/audio/captions-worker.ts (1)
62-69: Buffer overflow risk: mixing bytes and elements in capacity checkthis.#queued.byteLength is bytes; samples.length is elements. This check can miss capacity and cause a RangeError when expanding the view.
Use element counts against MAX_WHISPER_BUFFER:
- if (this.#queued.byteLength + samples.length > this.#queued.buffer.byteLength) { + if (this.#queued.length + samples.length > MAX_WHISPER_BUFFER) { if (!this.flush()) { console.warn("buffer is full, dropping samples"); return; } }Also applies to: 75-77
🧹 Nitpick comments (18)
js/hang/src/publish/element.ts (4)
7-7: Expose independent speaking control on the element APIThe PR goal is to separate VAD (speaking) from captions, but the custom element does not yet expose a standalone "speaking" attribute. Recommend adding it so integrators can enable VAD-only visualizations without captions.
Apply this diff to include the attribute:
-const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions"] as const; +const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions", "speaking"] as const;And add the following (outside the changed range) for proper handling:
// attributeChangedCallback(...) } else if (name === "speaking") { this.speaking = newValue !== null; } else { // ... // Add a dedicated property get speaking(): boolean { return this.broadcast.audio.speaking.enabled.peek(); } set speaking(speaking: boolean) { this.broadcast.audio.speaking.enabled.set(speaking); }
129-136: Decouple captions() from speaking() to reflect “separate VAD and captions”Setter currently toggles both captions and speaking together, which undermines their separation. Let captions control only captions. If you still want a convenience toggle, consider a new attribute like captions-and-vad.
get captions(): boolean { - return this.broadcast.audio.captions.enabled.peek(); + return this.broadcast.audio.captions.enabled.peek(); } set captions(captions: boolean) { - this.broadcast.audio.captions.enabled.set(captions); - this.broadcast.audio.speaking.enabled.set(captions); + this.broadcast.audio.captions.enabled.set(captions); }
175-177: Show the speaking indicator even when captions are off (if VAD is enabled)Currently, the captions row (which includes the speaking icon) is hidden when captions are disabled. For VAD-only use-cases, this hides the speaking indicator unnecessarily.
-const show = effect.get(this.broadcast.audio.captions.enabled); +const show = + effect.get(this.broadcast.audio.captions.enabled) || + effect.get(this.broadcast.audio.speaking.enabled);
191-197: Speaking icon logic is fine; ensure consistent falsey stateLooks good. If types ever allow undefined for active on the publish side, consider
const speaking = !!effect.get(this.broadcast.audio.speaking.active);to normalize. Not blocking.js/hang/src/catalog/audio/speaking.ts (1)
1-9: Minimal schema is appropriateThe
SpeakingSchemamirroring a simple MoQ track is sufficient for cataloging a boolean VAD stream. Consider adding a brief doc comment that the encoding is 1 byte: 0=false, 1=true.js/hang/src/watch/element.ts (1)
184-187: Confirm intent: captions toggle implicitly enables speaking (network/CPU coupling)Tying speaking.enabled to the captions attribute may surprise users and increase bandwidth/CPU when someone only wants text. If intentional for now, fine; otherwise consider a separate attribute/prop (e.g., speaking) to decouple.
If you want to decouple without adding a new observed attribute yet, gate speaking on a second internal Signal defaulted from captions once (not kept in lockstep).
js/hang/src/publish/audio/index.ts (1)
183-191: Minor: consider making codec/bitrate configurableHard-coding opus and bitrate per channel is fine for now, but making these props (or deriving from constraints) will help experimentation across browsers and devices.
Also applies to: 231-232
js/hang/src/watch/audio/speaking.ts (1)
47-57: Optional: guard the async loop with try/catch and log onceMirror the defensive pattern used elsewhere to avoid silent worker failures and to reset state on errors.
Apply:
- effect.spawn(async (cancel) => { - for (;;) { - const frame = await Promise.race([sub.nextFrame(), cancel]); - if (!frame) break; - const active = frame.data.length > 0 && frame.data[0] !== 0; - this.#active.set(active); - } - }); + effect.spawn(async (cancel) => { + try { + for (;;) { + const frame = await Promise.race([sub.nextFrame(), cancel]); + if (!frame) break; + const active = frame.data.length > 0 && frame.data[0] !== 0; + this.#active.set(active); + } + } catch (err) { + console.warn("speaking subscription error", err); + } + });js/hang/src/publish/audio/captions-worker.ts (1)
97-113: Propagate worker errors instead of crashing on transcription failuresA try/catch here lets the host display/log issues without losing the worker.
Apply:
- async #flush(buffer: Float32Array) { - const model = await this.#model; - - // Do the expensive transcription. - const result = await model(buffer); - if (Array.isArray(result)) { - throw new Error("Expected a single result, got an array"); - } - - const text = result.text.trim(); - if (text === "[BLANK_AUDIO]" || text === "") return; - - postResult({ - type: "text", - text, - }); - } + async #flush(buffer: Float32Array) { + try { + const model = await this.#model; + const result = await model(buffer); + if (Array.isArray(result)) { + throw new Error("Expected a single result, got an array"); + } + const text = result.text.trim(); + if (text === "[BLANK_AUDIO]" || text === "") return; + postResult({ type: "text", text }); + } catch (e: unknown) { + const message = e instanceof Error ? e.message : String(e); + postResult({ type: "error", message }); + } + }js/hang/src/publish/audio/speaking-worker.ts (2)
45-47: Fix comment to match logic (and new threshold check)Comment says “3 in a row” but constant and logic use 8. Align the comment with VAD_SILENCE_PADDING to avoid confusion.
- // Count the number of silence results, if we get 3 in a row then we're done. + // Count consecutive silence results; once we reach VAD_SILENCE_PADDING, we're done speaking.
123-130: Guard against duplicate init and validate message typeThe worker creates a new Vad instance for every message. Future messages (or double inits) will rewire handlers unexpectedly.
Apply this diff to accept a single init and ignore duplicates:
+let vad: Vad | undefined; + -self.addEventListener("message", async (event: MessageEvent<Request>) => { - const message = event.data; - - const vad = new Vad(); - message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => { - vad.write(channels[0]); - }; -}); +self.addEventListener("message", (event: MessageEvent<Request>) => { + const message = event.data; + if (message.type !== "init") { + postResult({ type: "error", message: "unexpected message type" }); + return; + } + if (vad) { + // Already initialized; ignore duplicates. + return; + } + vad = new Vad(); + message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => { + vad!.write(channels[0]); + }; +});js/hang/src/publish/audio/speaking.ts (3)
65-66: Clarify error source in logsThe error originates from the speaking worker, not “VAD worker.” Improves debuggability.
- console.error("VAD worker error:", data.message); + console.error("speaking worker error:", data.message);
56-68: Add onerror handler for worker crashesUnhandled worker errors won’t post a Result and leave active=true stale. Add a generic onerror trap.
const worker = new Worker(new URL("./speaking-worker", import.meta.url), { type: "module" }); effect.cleanup(() => worker.terminate()); // Handle messages from the worker worker.onmessage = ({ data }: MessageEvent<Result>) => { if (data.type === "speaking") { // Use heuristics to determine if we've toggled speaking or not this.active.set(data.speaking); } else if (data.type === "error") { - console.error("VAD worker error:", data.message); + console.error("speaking worker error:", data.message); this.active.set(false); } }; + worker.onerror = (ev) => { + console.error("speaking worker crash:", (ev as ErrorEvent).message ?? ev); + this.active.set(false); + };
74-107: Consider reusing capture to avoid multiple AudioContextsSpeaking and captions each spin up their own 16 kHz AudioContext and capture chain. If CPU is tight, consider sharing a single capture worklet and teeing frames to multiple workers via MessagePorts to avoid duplicate decode/resample overhead.
js/hang/src/publish/audio/captions.ts (4)
9-15: Unused CaptionsProps.transcribetranscribe is introduced but unused. Either wire it to enable/disable transcription behavior or remove to avoid confusion.
If you intend it to gate worker setup, one minimal integration is:
export type CaptionsProps = { enabled?: boolean; - transcribe?: boolean; + transcribe?: boolean; // Captions are cleared after this many milliseconds. (10s default) ttl?: DOMHighResTimeStamp; }; constructor(audio: Audio, props?: CaptionsProps) { this.audio = audio; this.#ttl = props?.ttl ?? 10000; - this.enabled = new Signal(props?.enabled ?? false); + this.enabled = new Signal((props?.enabled ?? false) && (props?.transcribe ?? true));Or drop the prop until needed.
69-75: Clarify error source in logsThis is the captions worker, not VAD.
- } else if (data.type === "error") { - console.error("VAD worker error:", data.message); + } else if (data.type === "error") { + console.error("captions worker error:", data.message); this.text.set(undefined); }
65-76: Add onerror handler for worker crashesMirror speaking.ts: catch unhandled worker errors and clear text.
const worker = new Worker(new URL("./captions-worker", import.meta.url), { type: "module" }); effect.cleanup(() => worker.terminate()); // Handle messages from the worker worker.onmessage = ({ data }: MessageEvent<Result>) => { if (data.type === "text") { this.text.set(data.text); } else if (data.type === "error") { - console.error("VAD worker error:", data.message); + console.error("captions worker error:", data.message); this.text.set(undefined); } }; + worker.onerror = (ev) => { + console.error("captions worker crash:", (ev as ErrorEvent).message ?? ev); + this.text.set(undefined); + };
117-124: Captions depend on speaking; consider a fallback modeWhen speaking isn’t enabled, captions log a warning and never signal speaking to the worker. If you want captions to work independently, allow a fallback (e.g., assume speaking=true) or make the dependency explicit in the API.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
js/hang/src/catalog/audio/captions.ts(1 hunks)js/hang/src/catalog/audio/index.ts(2 hunks)js/hang/src/catalog/audio/speaking.ts(1 hunks)js/hang/src/catalog/index.ts(1 hunks)js/hang/src/publish/audio/captions-worker.ts(4 hunks)js/hang/src/publish/audio/captions-worklet.ts(0 hunks)js/hang/src/publish/audio/captions.ts(4 hunks)js/hang/src/publish/audio/index.ts(9 hunks)js/hang/src/publish/audio/speaking-worker.ts(1 hunks)js/hang/src/publish/audio/speaking.ts(1 hunks)js/hang/src/publish/element.ts(4 hunks)js/hang/src/watch/audio/index.ts(5 hunks)js/hang/src/watch/audio/speaking.ts(1 hunks)js/hang/src/watch/element.ts(2 hunks)
💤 Files with no reviewable changes (1)
- js/hang/src/publish/audio/captions-worklet.ts
🧰 Additional context used
📓 Path-based instructions (2)
{rs,js}/hang/**
📄 CodeRabbit inference engine (CLAUDE.md)
All media logic belongs in the hang layer (rs/hang and js/hang)
Files:
js/hang/src/catalog/audio/speaking.tsjs/hang/src/catalog/audio/captions.tsjs/hang/src/watch/audio/speaking.tsjs/hang/src/publish/audio/speaking-worker.tsjs/hang/src/publish/audio/speaking.tsjs/hang/src/catalog/audio/index.tsjs/hang/src/watch/element.tsjs/hang/src/catalog/index.tsjs/hang/src/publish/element.tsjs/hang/src/publish/audio/index.tsjs/hang/src/publish/audio/captions-worker.tsjs/hang/src/watch/audio/index.tsjs/hang/src/publish/audio/captions.ts
js/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place JS/TS packages under the js/ directory
Files:
js/hang/src/catalog/audio/speaking.tsjs/hang/src/catalog/audio/captions.tsjs/hang/src/watch/audio/speaking.tsjs/hang/src/publish/audio/speaking-worker.tsjs/hang/src/publish/audio/speaking.tsjs/hang/src/catalog/audio/index.tsjs/hang/src/watch/element.tsjs/hang/src/catalog/index.tsjs/hang/src/publish/element.tsjs/hang/src/publish/audio/index.tsjs/hang/src/publish/audio/captions-worker.tsjs/hang/src/watch/audio/index.tsjs/hang/src/publish/audio/captions.ts
🧬 Code graph analysis (11)
js/hang/src/catalog/audio/speaking.ts (3)
js/hang/src/catalog/track.ts (1)
TrackSchema(4-7)js/hang/src/publish/audio/speaking.ts (1)
Speaking(14-113)js/hang/src/watch/audio/speaking.ts (1)
Speaking(9-62)
js/hang/src/watch/audio/speaking.ts (4)
js/signals/src/index.ts (3)
Getter(10-13)Signal(19-76)Effect(81-408)js/moq/src/broadcast.ts (1)
BroadcastConsumer(122-180)js/hang/src/catalog/audio/index.ts (1)
Audio(43-43)js/hang/src/watch/audio/index.ts (3)
Audio(33-212)effect(79-121)effect(123-170)
js/hang/src/publish/audio/speaking-worker.ts (3)
js/hang/src/publish/audio/captions-worker.ts (5)
Request(4-4)Init(6-10)Result(17-17)Speaking(12-15)Error(24-27)js/hang/src/publish/audio/speaking.ts (1)
Speaking(14-113)js/hang/src/publish/audio/capture.ts (1)
AudioFrame(1-4)
js/hang/src/publish/audio/speaking.ts (5)
js/hang/src/watch/audio/speaking.ts (3)
SpeakingProps(5-7)Speaking(9-62)effect(32-57)js/hang/src/publish/audio/captions-worker.ts (3)
Speaking(12-15)Result(17-17)Request(4-4)js/hang/src/publish/audio/speaking-worker.ts (3)
Speaking(14-17)Result(12-12)Request(4-4)js/hang/src/publish/audio/index.ts (5)
Audio(59-283)effect(106-150)effect(152-165)effect(167-256)effect(258-276)js/hang/src/publish/audio/captions.ts (1)
effect(39-125)
js/hang/src/catalog/audio/index.ts (1)
js/hang/src/catalog/audio/speaking.ts (1)
SpeakingSchema(4-7)
js/hang/src/watch/element.ts (4)
js/hang/src/publish/element.ts (3)
captions(129-131)captions(133-136)captions(161-209)js/hang/src/publish/audio/captions.ts (1)
effect(39-125)js/hang/src/watch/audio/speaking.ts (1)
effect(32-57)js/hang/src/watch/audio/captions.ts (1)
effect(32-57)
js/hang/src/publish/element.ts (7)
js/hang/src/watch/element.ts (3)
captions(180-182)captions(184-187)captions(230-274)js/hang/src/publish/audio/captions.ts (1)
effect(39-125)js/hang/src/publish/audio/index.ts (4)
effect(106-150)effect(152-165)effect(167-256)effect(258-276)js/hang/src/publish/audio/speaking.ts (1)
effect(32-108)js/hang/src/watch/audio/speaking.ts (1)
effect(32-57)js/hang/src/watch/audio/captions.ts (1)
effect(32-57)js/hang/src/publish/audio/captions-worker.ts (1)
speaking(115-123)
js/hang/src/publish/audio/index.ts (2)
js/hang/src/publish/audio/speaking.ts (3)
SpeakingProps(9-11)Speaking(14-113)effect(32-108)js/hang/src/publish/audio/captions.ts (1)
effect(39-125)
js/hang/src/publish/audio/captions-worker.ts (3)
js/hang/src/publish/audio/speaking-worker.ts (5)
Request(4-4)Init(6-10)Speaking(14-17)Result(12-12)Error(19-22)js/hang/src/publish/audio/speaking.ts (1)
Speaking(14-113)js/hang/src/publish/audio/capture.ts (1)
AudioFrame(1-4)
js/hang/src/watch/audio/index.ts (1)
js/hang/src/watch/audio/speaking.ts (2)
SpeakingProps(5-7)Speaking(9-62)
js/hang/src/publish/audio/captions.ts (4)
js/hang/src/watch/audio/captions.ts (2)
CaptionsProps(5-7)effect(32-57)js/hang/src/publish/audio/index.ts (4)
effect(106-150)effect(152-165)effect(167-256)effect(258-276)js/hang/src/publish/audio/speaking.ts (1)
effect(32-108)js/hang/src/publish/audio/captions-worker.ts (1)
speaking(115-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: claude-review
- GitHub Check: Check
🔇 Additional comments (11)
js/hang/src/catalog/audio/captions.ts (1)
2-2: Import path realignment LGTMSwitching to
../trackmatches the catalog reorg and keeps schemas centralized.js/hang/src/catalog/index.ts (1)
2-3: No duplicate or conflicting re-exports detectedAfter inspecting the re-export chain:
js/hang/src/catalog/index.tsexports:
*from"./audio"(which only exportsAudioSchema,AudioConfigSchema, and their corresponding types)*from"./audio/captions"(exportsCaptionsSchemaand its type)*from"./audio/speaking"(exportsSpeakingSchemaand its type)js/hang/src/catalog/audio/index.tsdoes not re-export from./captionsor./speaking, so there’s no overlap in symbol names.- The explicit re-exports of
captionsandspeakingeach introduce unique identifiers (CaptionsSchema,Captions,SpeakingSchema,Speaking) that do not shadow or duplicate anything from theaudioindex.No further action is required here.
js/hang/src/catalog/audio/index.ts (2)
2-6: Schema import path adjustments look correctMoving
u53SchemaandTrackSchemato the parent paths keeps consistency with the catalog layout.
38-41: Optional speaking field is a clean extensionAdding
speaking?: SpeakingSchemaas optional maintains backward compatibility for existing consumers that don’t expect VAD. No issues.js/hang/src/catalog/audio/speaking.ts (1)
1-9: Ensure speaking frames are detected by inspecting the first byteThe watch-side consumer currently treats any non-empty frame as “speaking,” but the producer encodes speaking/silence as a single byte (
1/0). We need to update the check so that onlyframe.data[0] === 1is considered active.• In js/hang/src/watch/audio/speaking.ts (around line 52):
- const active = frame.data.length > 0; + const active = frame.data.length > 0 && frame.data[0] === 1;This ensures that silence frames (
[0]) no longer peg the UI to “speaking.”Likely an incorrect or invalid review comment.
js/hang/src/watch/audio/index.ts (3)
11-12: Speaking track plumbing into watch.Audio props and surface — LGTMImporting Speaking and exposing it on AudioProps as an optional nested prop is consistent with the Captions pattern and keeps the API coherent.
Also applies to: 23-25
49-50: Correct instantiation and lifecycle wiring for SpeakingConstructing Speaking with the same broadcast/info getters as Captions ensures it reacts to catalog changes. Early returns inside Speaking.#run gate work until enabled/info.speaking are present — good.
Also applies to: 69-70
206-207: Closing Speaking alongside Captions — LGTMIncluding speaking.close() in Audio.close() prevents dangling subscriptions.
js/hang/src/publish/audio/index.ts (2)
16-17: Speaking support and config surface — LGTM
- Importing/wiring Speaking mirrors Captions.
- Adding a public config Getter via #config improves catalog assembly orderliness.
Also applies to: 52-53, 66-67, 75-77
258-276: Dedicated #runCatalog consolidates catalog assembly — LGTMBuilding Catalog.Audio from track+config+captions+speaking in one place avoids partial states and racey consumers.
js/hang/src/publish/audio/captions-worker.ts (1)
131-137: Host-controlled speaking handoff — LGTMRouting the speaking flag from the host into the worker simplifies roles and lets VAD be independently tuned or replaced.
| #sr = new Tensor("int64", [SAMPLE_RATE], []); | ||
| #state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]); | ||
| #speaking = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm dtype for sr tensor
Transformers.js often expects int64 inputs to be provided as BigInt or int64-compatible arrays depending on the model wrapper. Please confirm that passing a number[] for an int64 Tensor works across browsers for this specific model.
🌐 Web query:
Does Transformers.js accept a plain number[] for an int64 Tensor input (e.g., sr: 16000) when running the ONNX Silero VAD model in the browser?
💡 Result:
Short answer: No — you must pass an int64 TypedArray (BigInt64Array) rather than a plain number[] when running ONNX int64 inputs in the browser. ONNX runtimes used by Transformers.js expect a BigInt64Array for int64 tensors; a plain number[] will not be treated as int64 and will cause a type error or wrong behavior [1][2].
Sources
- transformers.js (discussion/issue): ONNX/wasm/web runtimes require BigInt64Array for int64 tensors (use BigInt64Array for int64 input) [1].
- ONNX Runtime Web docs: int64 tensor inputs use BigInt64Array in JavaScript (not plain number[]) [2].
Use a BigInt64Array for the int64 “sr” Tensor
Transformers.js (and the underlying ONNX Web runtimes) require int64 inputs to be passed as a BigInt64Array rather than a plain number[]. A number[] will not be treated as int64 and will lead to errors or incorrect behavior
Please update the code in js/hang/src/publish/audio/speaking-worker.ts (around lines 41–43) accordingly:
- #sr = new Tensor("int64", [SAMPLE_RATE], []);
+ // Pass a BigInt64Array for int64 inputs; dims=[] for a scalar
+ #sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []);
#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
#speaking = false;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #sr = new Tensor("int64", [SAMPLE_RATE], []); | |
| #state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]); | |
| #speaking = false; | |
| // Pass a BigInt64Array for int64 inputs; dims=[] for a scalar | |
| #sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []); | |
| #state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]); | |
| #speaking = false; |
🤖 Prompt for AI Agents
In js/hang/src/publish/audio/speaking-worker.ts around lines 41 to 43, the int64
Tensor for sr is being constructed from a plain number array; replace that with
a BigInt64Array containing BigInt(SAMPLE_RATE) and pass that to the Tensor
constructor (e.g. new BigInt64Array([BigInt(SAMPLE_RATE)]) and new
Tensor("int64", thatBigIntArray, [SAMPLE_RATE] or appropriate shape)); leave the
state and speaking declarations unchanged.
| write(samples: Float32Array) { | ||
| if (this.#next.byteLength >= this.#next.buffer.byteLength) { | ||
| if (!this.flush()) { | ||
| // Drop the sample if VAD is still processing. | ||
| return; | ||
| } | ||
| } | ||
|
|
||
| this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length); | ||
| this.#next.set(samples, this.#next.length - samples.length); | ||
|
|
||
| if (this.#next.byteLength === this.#next.buffer.byteLength) { | ||
| this.flush(); // don't care if it fails | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Prevent chunk overflow and dropped samples in write()
Current logic can overshoot the fixed 512-sample buffer if a worklet emits larger frames; it also silently drops data while a flush is in-flight. Boundaries should be respected by chunking the input into the remaining capacity and flushing exactly at VAD_CHUNK_SIZE.
Apply this diff to make writes chunk-safe and flush at exact boundaries:
- write(samples: Float32Array) {
- if (this.#next.byteLength >= this.#next.buffer.byteLength) {
- if (!this.flush()) {
- // Drop the sample if VAD is still processing.
- return;
- }
- }
-
- this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
- this.#next.set(samples, this.#next.length - samples.length);
-
- if (this.#next.byteLength === this.#next.buffer.byteLength) {
- this.flush(); // don't care if it fails
- }
- }
+ write(samples: Float32Array) {
+ let offset = 0;
+ while (offset < samples.length) {
+ // If an inference is running, drop the remainder to bound latency.
+ if (this.#processing) return;
+
+ const remaining = VAD_CHUNK_SIZE - this.#next.length;
+ if (remaining === 0) {
+ this.flush();
+ continue;
+ }
+
+ const toCopy = Math.min(remaining, samples.length - offset);
+ const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy);
+ next.set(samples.subarray(offset, offset + toCopy), this.#next.length);
+ this.#next = next;
+ offset += toCopy;
+
+ if (this.#next.length === VAD_CHUNK_SIZE) {
+ this.flush(); // schedule inference
+ }
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| write(samples: Float32Array) { | |
| if (this.#next.byteLength >= this.#next.buffer.byteLength) { | |
| if (!this.flush()) { | |
| // Drop the sample if VAD is still processing. | |
| return; | |
| } | |
| } | |
| this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length); | |
| this.#next.set(samples, this.#next.length - samples.length); | |
| if (this.#next.byteLength === this.#next.buffer.byteLength) { | |
| this.flush(); // don't care if it fails | |
| } | |
| } | |
| write(samples: Float32Array) { | |
| let offset = 0; | |
| while (offset < samples.length) { | |
| // If an inference is running, drop the remainder to bound latency. | |
| if (this.#processing) return; | |
| const remaining = VAD_CHUNK_SIZE - this.#next.length; | |
| if (remaining === 0) { | |
| this.flush(); | |
| continue; | |
| } | |
| const toCopy = Math.min(remaining, samples.length - offset); | |
| const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy); | |
| next.set(samples.subarray(offset, offset + toCopy), this.#next.length); | |
| this.#next = next; | |
| offset += toCopy; | |
| if (this.#next.length === VAD_CHUNK_SIZE) { | |
| this.flush(); // schedule inference | |
| } | |
| } | |
| } |
| async #flush() { | ||
| const model = await this.#model; | ||
|
|
||
| const input = new Tensor("float32", this.#current, [1, this.#current.length]); | ||
| const result = await model({ input, sr: this.#sr, state: this.#state }); | ||
| this.#state = result.stateN; | ||
|
|
||
| const isSpeech = result.output.data[0]; | ||
| if (this.#speaking && isSpeech < 0.3) { | ||
| this.#silence++; | ||
|
|
||
| if (this.#silence > VAD_SILENCE_PADDING) { | ||
| this.#speaking = false; | ||
|
|
||
| postResult({ | ||
| type: "speaking", | ||
| speaking: false, | ||
| }); | ||
| } | ||
| } else if (!this.#speaking && isSpeech >= 0.1) { | ||
| this.#speaking = true; | ||
| this.#silence = 0; | ||
|
|
||
| postResult({ | ||
| type: "speaking", | ||
| speaking: true, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle model load/inference errors to avoid unhandled rejections
If model download or inference fails, #flush() will reject and produce unhandled promise rejections. Post an error Result and recover.
Apply this diff to add robust error handling:
- async #flush() {
- const model = await this.#model;
-
- const input = new Tensor("float32", this.#current, [1, this.#current.length]);
- const result = await model({ input, sr: this.#sr, state: this.#state });
- this.#state = result.stateN;
-
- const isSpeech = result.output.data[0];
+ async #flush() {
+ let model: PreTrainedModel;
+ try {
+ model = await this.#model;
+ } catch (err) {
+ postResult({
+ type: "error",
+ message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`,
+ });
+ return;
+ }
+ try {
+ const input = new Tensor("float32", this.#current, [1, this.#current.length]);
+ const result = await model({ input, sr: this.#sr, state: this.#state });
+ this.#state = result.stateN;
+
+ const isSpeech = result.output.data[0];
if (this.#speaking && isSpeech < 0.3) {
this.#silence++;
- if (this.#silence > VAD_SILENCE_PADDING) {
+ if (this.#silence >= VAD_SILENCE_PADDING) {
this.#speaking = false;
postResult({
type: "speaking",
speaking: false,
});
}
} else if (!this.#speaking && isSpeech >= 0.1) {
this.#speaking = true;
this.#silence = 0;
postResult({
type: "speaking",
speaking: true,
});
}
+ } catch (err) {
+ postResult({
+ type: "error",
+ message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`,
+ });
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async #flush() { | |
| const model = await this.#model; | |
| const input = new Tensor("float32", this.#current, [1, this.#current.length]); | |
| const result = await model({ input, sr: this.#sr, state: this.#state }); | |
| this.#state = result.stateN; | |
| const isSpeech = result.output.data[0]; | |
| if (this.#speaking && isSpeech < 0.3) { | |
| this.#silence++; | |
| if (this.#silence > VAD_SILENCE_PADDING) { | |
| this.#speaking = false; | |
| postResult({ | |
| type: "speaking", | |
| speaking: false, | |
| }); | |
| } | |
| } else if (!this.#speaking && isSpeech >= 0.1) { | |
| this.#speaking = true; | |
| this.#silence = 0; | |
| postResult({ | |
| type: "speaking", | |
| speaking: true, | |
| }); | |
| } | |
| } | |
| async #flush() { | |
| let model: PreTrainedModel; | |
| try { | |
| model = await this.#model; | |
| } catch (err) { | |
| postResult({ | |
| type: "error", | |
| message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`, | |
| }); | |
| return; | |
| } | |
| try { | |
| const input = new Tensor("float32", this.#current, [1, this.#current.length]); | |
| const result = await model({ input, sr: this.#sr, state: this.#state }); | |
| this.#state = result.stateN; | |
| const isSpeech = result.output.data[0]; | |
| if (this.#speaking && isSpeech < 0.3) { | |
| this.#silence++; | |
| if (this.#silence >= VAD_SILENCE_PADDING) { | |
| this.#speaking = false; | |
| postResult({ | |
| type: "speaking", | |
| speaking: false, | |
| }); | |
| } | |
| } else if (!this.#speaking && isSpeech >= 0.1) { | |
| this.#speaking = true; | |
| this.#silence = 0; | |
| postResult({ | |
| type: "speaking", | |
| speaking: true, | |
| }); | |
| } | |
| } catch (err) { | |
| postResult({ | |
| type: "error", | |
| message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`, | |
| }); | |
| } | |
| } |
🤖 Prompt for AI Agents
In js/hang/src/publish/audio/speaking-worker.ts around lines 92-120, #flush()
currently awaits model load and runs inference without catching errors, which
can cause unhandled promise rejections; wrap the model access and inference in a
try/catch, and on error call postResult with an error result (e.g. { type:
"error", message: String(err) }), reset/clear any transient state (like #current
buffer and #state or incrementally safe flags) so the worker can continue, and
ensure the method returns cleanly after handling the error to avoid propagating
the rejection.
| const leftSpacer = DOM.create("div", { | ||
| style: { width: "1.5em" }, | ||
| }); | ||
|
|
||
| const captionText = DOM.create("div", { | ||
| style: { textAlign: "center" }, | ||
| }); | ||
|
|
||
| const speakingIcon = DOM.create("div", { | ||
| style: { width: "1.5em" }, | ||
| }); | ||
|
|
||
| effect.effect((effect) => { | ||
| const text = effect.get(this.broadcast.audio.captions.text); | ||
| const speaking = effect.get(this.broadcast.audio.speaking.active); | ||
|
|
||
| captionText.textContent = text ?? ""; | ||
| speakingIcon.textContent = speaking ? "🗣️" : " "; | ||
| }); | ||
|
|
||
| captions.appendChild(leftSpacer); | ||
| captions.appendChild(captionText); | ||
| captions.appendChild(speakingIcon); | ||
|
|
||
| effect.cleanup(() => { | ||
| captions.textContent = ""; | ||
| captions.removeChild(leftSpacer); | ||
| captions.removeChild(captionText); | ||
| captions.removeChild(speakingIcon); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Captions row will stack vertically; container needs flex for left spacer + centered text + icon
With three block-level divs, they render vertically. Add display:flex on the captions container and let the middle node flex.
Apply this targeted diff for the changed block to make the middle node flex:
- const captionText = DOM.create("div", {
- style: { textAlign: "center" },
- });
+ const captionText = DOM.create("div", {
+ style: { textAlign: "center", flex: "1" },
+ });And update the container style (outside this hunk) to enable the horizontal layout:
// At lines 231-235
const captions = DOM.create("div", {
style: {
display: "flex",
justifyContent: "space-around",
gap: "16px",
minHeight: "1lh",
alignContent: "center",
},
});Optional: mark the speaking icon decorative to avoid verbosity for screen readers:
- const speakingIcon = DOM.create("div", {
- style: { width: "1.5em" },
- });
+ const speakingIcon = DOM.create("div", {
+ style: { width: "1.5em" },
+ "aria-hidden": "true",
+ });🤖 Prompt for AI Agents
In js/hang/src/watch/element.ts around lines 231-272, the captions row uses
three block divs so they stack vertically; update the captions container (lines
~231-235) to use display: "flex" with horizontal alignment (e.g., display:
"flex", alignItems: "center", gap: "16px", justifyContent: "space-between") and
change the middle captionText node (lines 244-252) to allow it to expand
horizontally by adding style: { flex: "1", textAlign: "center" } so the
leftSpacer and speakingIcon keep fixed widths; optionally mark speakingIcon as
decorative (aria-hidden="true") to reduce screen-reader verbosity.
Use the stream closed state to signal false.
|
Claude finished @kixelated's task —— View job Code Review Complete 👍I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings: ✅ Strengths
🔍 Minor Observations (not issues)
🏗️ ArchitectureThe separation creates a cleaner architecture where:
Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management. |
|
Claude finished @kixelated's task —— View job Code Review Complete 👍I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings: ✅ Strengths
🔍 Minor Observations (not issues)
🏗️ ArchitectureThe separation creates a cleaner architecture where:
Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
js/hang/src/container/bool.ts (1)
12-22: Encoding via group open/close is solid and removes prior false/true framing ambiguity.Using “group open” to mean true (with a single [0x01] frame) and “group closed” to mean false is a clean, backpressure-friendly protocol and avoids the earlier watch-side misread of 0x00 as active. Nice.
🧹 Nitpick comments (3)
js/hang/src/container/bool.ts (2)
3-3: Fix minor grammar in the doc comment."write a frame" → "writes a frame".
Apply:
-// Creates a track that write a frame on true, and closes the group on false. +// Creates a track that writes a frame on true, and closes the group on false.
28-32: Close the open group before closing the track to guarantee a trailing “false” edge.Closing the track first will implicitly close the group, but closing the group explicitly first guarantees a final false emission before track teardown. Low risk either way; this order is clearer.
- close() { - this.track.close(); - this.#group?.close(); - this.#group = undefined; - } + close() { + this.#group?.close(); + this.#group = undefined; + this.track.close(); + }js/hang/src/publish/audio/speaking.ts (1)
39-41: Reset Speaking catalog on cleanup to avoid stale metadata in Audio catalog.When disabled, you remove the track from broadcast but leave this.catalog populated. Audio.#runCatalog reads speaking from this signal, so metadata can remain “present” after disable.
this.audio.broadcast.insertTrack(this.#bool.track.consume()); effect.cleanup(() => this.audio.broadcast.removeTrack(this.#bool.track.name)); const catalog: Catalog.Speaking = { track: { name: this.#bool.track.name, priority: u8(this.#bool.track.priority), }, }; - effect.set(this.catalog, catalog); + effect.set(this.catalog, catalog); + effect.cleanup(() => this.catalog.set(undefined));Also applies to: 42-49
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
js/hang/src/container/bool.ts(1 hunks)js/hang/src/publish/audio/speaking.ts(1 hunks)js/hang/src/watch/audio/speaking.ts(1 hunks)js/moq/src/group.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- js/hang/src/watch/audio/speaking.ts
🧰 Additional context used
📓 Path-based instructions (2)
js/**
📄 CodeRabbit inference engine (CLAUDE.md)
Place JS/TS packages under the js/ directory
Files:
js/moq/src/group.tsjs/hang/src/container/bool.tsjs/hang/src/publish/audio/speaking.ts
{rs,js}/hang/**
📄 CodeRabbit inference engine (CLAUDE.md)
All media logic belongs in the hang layer (rs/hang and js/hang)
Files:
js/hang/src/container/bool.tsjs/hang/src/publish/audio/speaking.ts
🧬 Code graph analysis (2)
js/hang/src/container/bool.ts (2)
js/moq/src/track.ts (2)
TrackProducer(9-118)TrackConsumer(125-279)js/moq/src/group.ts (2)
GroupProducer(8-59)GroupConsumer(66-120)
js/hang/src/publish/audio/speaking.ts (4)
js/hang/src/publish/audio/speaking-worker.ts (3)
Speaking(14-17)Result(12-12)Request(4-4)js/hang/src/publish/audio/index.ts (5)
Audio(59-283)effect(106-150)effect(152-165)effect(167-256)effect(258-276)js/hang/src/container/bool.ts (1)
BoolProducer(4-33)js/hang/src/publish/audio/captions.ts (1)
effect(39-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Check
🔇 Additional comments (7)
js/moq/src/group.ts (2)
94-101: LGTM: closed() is a useful lifecycle primitive and matches WatchConsumer behavior.Straightforward pass-through to the underlying frames.closed(); this aligns with the PR’s broader lifecycle cleanup direction.
94-101: The above will reveal how#framesis set up ingroup.tsand confirm it’s aWatchConsumer, so we can verify that its.closed()rejects on abort.js/hang/src/container/bool.ts (1)
24-26: Handle BoolProducer.clone() APIThe current
clone()implementation returns a freshBoolProducerwith its own#group, leading to potential state desynchronization if multiple producers write to the same track. Our grep of the codebase shows thatclone()onBoolProducerisn’t called anywhere injs/hang/srcexcept in its own definition, so removing or altering it won’t break internal usage—but it is part of the public API and could impact consumers outside this repo.Please choose one of the following approaches and verify that downstream code will remain correct:
• Remove
clone()entirely
– Delete the method fromjs/hang/src/container/bool.ts.
– Bump the major version and call out the breaking change in the changelog.• Refactor to share backing state
– Replace the private#group?: GroupProducerfield with a shared#state: { group?: GroupProducer }object.
– Update the constructor and all usages of#group(inwrite(),close()) to use#state.group.
– Changeclone()to pass the same#stateinto the new instance:```diff // js/hang/src/container/bool.ts -#group?: GroupProducer; +#state: { group?: GroupProducer }; -constructor(track: TrackProducer) { - this.track = track; -} +constructor(track: TrackProducer, state: { group?: GroupProducer } = {}) { + this.track = track; + this.#state = state; +} write(value: boolean) { if (value) { - if (this.#group) return; - this.#group = this.track.appendGroup(); - this.#group.writeFrame(...); + if (this.#state.group) return; + this.#state.group = this.track.appendGroup(); + this.#state.group.writeFrame(...); } else { - if (!this.#group) return; - this.#group.close(); - this.#group = undefined; + if (!this.#state.group) return; + this.#state.group.close(); + this.#state.group = undefined; } } -clone() { - return new BoolProducer(this.track); -} +clone() { + return new BoolProducer(this.track, this.#state); +} close() { this.track.close(); - this.#group?.close(); - this.#group = undefined; + this.#state.group?.close(); + this.#state.group = undefined; } ```• Deprecate first, then remove
– Markclone()as deprecated in JSDoc.
– In the next major release, drop the method entirely.jm: Please confirm which path you’d like to take and ensure any external consumers of
BoolProducer.clone()are updated accordingly.js/hang/src/publish/audio/speaking.ts (4)
50-55: Direct wiring of active → BoolProducer is clean and efficient.The nested effect avoids track recreation and only emits on state edges. Good use of the container abstraction.
74-107: Audio graph setup looks correct; keep the bundler contract for CaptureWorklet stable.The addModule(CaptureWorklet) pattern relies on the bundler returning a module URL via ?worker&url. Ensure this remains aligned with your build. No changes requested.
56-69: Hysteresis Already Present in VAD Worker
Thespeaking-worker.tslogic uses distinct on/off thresholds (≥0.1 to start speaking, <0.3 to stop) and waits for 8 consecutive silent chunks (~8 × 31 ms ≈ 248 ms) before emittingspeaking: false, matching the suggested hold time. No additional debounce/hysteresis is needed.
25-31: BoolProducer reuse confirmed; no changes neededI’ve verified that
#boolis instantiated once as a class-level field and is never re-created in the#runmethod. Each enable toggle simply calls:
this.audio.broadcast.insertTrack(this.#bool.track.consume())- And on cleanup,
this.audio.broadcast.removeTrack(this.#bool.track.name)This ensures you reuse the same producer and track across enable/disable cycles without unnecessary churn.
| async next(): Promise<boolean | undefined> { | ||
| for (;;) { | ||
| if (!this.#group) { | ||
| const group = await this.track.nextGroup(); | ||
| if (!group) return undefined; | ||
|
|
||
| this.#group = group; | ||
| return true; | ||
| } | ||
|
|
||
| const group = await Promise.race([ | ||
| this.track.nextGroup(), | ||
| this.#group.closed(), | ||
| ]) | ||
| if (group) { | ||
| this.#group = group; | ||
| continue; | ||
| } | ||
|
|
||
| this.#group.close(); | ||
| this.#group = undefined; | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Disambiguate Promise.race result to avoid treating end-of-track as “false”.
Promise.race between nextGroup() and closed() returns undefined for both “track ended” and “group closed” paths. Today both map to returning false. It’s better to handle the cases explicitly, so end-of-track behavior is intentional and robust to internal changes.
Apply:
- const group = await Promise.race([
- this.track.nextGroup(),
- this.#group.closed(),
- ])
- if (group) {
- this.#group = group;
- continue;
- }
-
- this.#group.close();
- this.#group = undefined;
- return false;
+ const result:
+ | { kind: "group"; group?: GroupConsumer }
+ | { kind: "closed" } =
+ await Promise.race([
+ this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })),
+ this.#group.closed().then(() => ({ kind: "closed" as const })),
+ ]);
+
+ if (result.kind === "group") {
+ if (result.group) {
+ this.#group = result.group;
+ continue;
+ }
+ // Track ended while current group still open. Emit a final false then let caller
+ // observe undefined on the next next() call.
+ this.#group.close();
+ this.#group = undefined;
+ return false;
+ }
+
+ // Current group closed → false edge.
+ this.#group.close();
+ this.#group = undefined;
+ return false;Optionally, guard against abort errors by converting closed() rejections into “closed” signals with .catch(() => ({ kind: "closed" as const })).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async next(): Promise<boolean | undefined> { | |
| for (;;) { | |
| if (!this.#group) { | |
| const group = await this.track.nextGroup(); | |
| if (!group) return undefined; | |
| this.#group = group; | |
| return true; | |
| } | |
| const group = await Promise.race([ | |
| this.track.nextGroup(), | |
| this.#group.closed(), | |
| ]) | |
| if (group) { | |
| this.#group = group; | |
| continue; | |
| } | |
| this.#group.close(); | |
| this.#group = undefined; | |
| return false; | |
| } | |
| } | |
| async next(): Promise<boolean | undefined> { | |
| for (;;) { | |
| if (!this.#group) { | |
| const group = await this.track.nextGroup(); | |
| if (!group) return undefined; | |
| this.#group = group; | |
| return true; | |
| } | |
| const result: | |
| | { kind: "group"; group?: GroupConsumer } | |
| | { kind: "closed" } = | |
| await Promise.race([ | |
| this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })), | |
| this.#group.closed().then(() => ({ kind: "closed" as const })), | |
| ]); | |
| if (result.kind === "group") { | |
| if (result.group) { | |
| this.#group = result.group; | |
| continue; | |
| } | |
| // Track ended while current group still open. Emit a final false then let caller | |
| // observe undefined on the next next() call. | |
| this.#group.close(); | |
| this.#group = undefined; | |
| return false; | |
| } | |
| // Current group closed → false edge. | |
| this.#group.close(); | |
| this.#group = undefined; | |
| return false; | |
| } | |
| } |
🤖 Prompt for AI Agents
In js/hang/src/container/bool.ts around lines 43-66, the Promise.race result is
ambiguous because both "track ended" (nextGroup() -> undefined) and "group
closed" (closed() -> undefined) produce the same falsy value; change the race to
return tagged results so you can distinguish sources (e.g. wrap nextGroup() to
resolve to { kind: "next", group: value } and closed() to resolve to { kind:
"closed" } and convert closed() rejections to the closed tag via .catch), then
branch on the tag: if kind === "next" and group is undefined return undefined
(end of track), if kind === "next" and group is a Group set this.#group and
continue, if kind === "closed" call this.#group.close(), clear this.#group and
return false. Ensure closed() rejections are caught and normalized to the closed
tag.
Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Technically, this increases the number of audio copies when captions are enabled and speaking is not detected, but idc lul.
Useful for more audio visualizations.
Summary by CodeRabbit
New Features
Breaking Changes
Refactor