Separate VAD and captions. #555

kixelated · 2025-08-25T21:52:43Z

Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Technically, this increases the number of audio copies when captions are enabled and speaking is not detected, but idc lul.

Useful for more audio visualizations.

Summary by CodeRabbit

New Features
- Real-time speaking detection with downloadable speaking track and a speaking indicator next to captions.
- Whisper-based live transcription streaming text results; captions auto-clear after a configurable TTL (default ~10s).
Breaking Changes
- Removed the transcribe element attribute; use captions/speaking toggles instead.
- Captions API/semantics updated (transcribe removed; TTL behavior/type changed).
Refactor
- Audio pipeline reorganized: dedicated speaking detection and separate transcription flow; captions and speaking state now driven by shared broadcast signals.

Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Useful for more audio visualizations.

coderabbitai · 2025-08-25T21:52:50Z

Walkthrough

Adds a speaking track and VAD pipeline (Silero) with worklet/worker wiring, replaces VAD transcription with a Whisper-based captions worker, introduces BoolProducer/BoolConsumer primitives, adjusts catalog exports and schemas, and updates publish/watch element wiring and UI rendering.

Changes

Cohort / File(s)	Summary
Catalog: audio schemas `js/hang/src/catalog/audio/index.ts`, `js/hang/src/catalog/audio/speaking.ts`, `js/hang/src/catalog/audio/captions.ts`	Adds `SpeakingSchema` and an optional `speaking` field on `AudioSchema`; new `speaking.ts` module; adjusts `TrackSchema` import paths.
Catalog: root exports `js/hang/src/catalog/index.ts`	Removes top-level `./captions` export; adds `./audio/captions` and `./audio/speaking` exports.
Publish: captions pipeline & worker `js/hang/src/publish/audio/captions-worker.ts`, `js/hang/src/publish/audio/captions.ts`, `js/hang/src/publish/audio/captions-worklet.ts`	Replaces VAD transcription internals with Whisper-based transcription in `captions-worker.ts` (request/result unions changed); removes `captions-worklet.ts`; `captions.ts` now uses a `capture` worklet, adds `transcribe?: boolean`, changes `ttl` type, removes internal `speaking` signal and pushes speaking state to the worker.
Publish: speaking pipeline & worker `js/hang/src/publish/audio/speaking.ts`, `js/hang/src/publish/audio/speaking-worker.ts`	New `Speaking` feature: `Speaking` class wires AudioContext/worklet and a speaking worker; `speaking-worker.ts` implements Silero VAD and emits speaking true/false events.
Publish: audio integration `js/hang/src/publish/audio/index.ts`	Integrates `Speaking` into `Audio` API/props; adds public `config` getter; centralizes catalog construction via an effect that includes config, captions, and speaking.
Publish: element API `js/hang/src/publish/element.ts`	Removes `transcribe` attribute/state and related accessors; captions toggle now also toggles `broadcast.audio.speaking.enabled`; rendering reads broadcast signals.
Watch: audio integration `js/hang/src/watch/audio/index.ts`, `js/hang/src/watch/audio/speaking.ts`	Adds watch-side `Speaking` controller and `speaking` prop; constructs/closes `Speaking` alongside Audio; exposes enabled/active state and subscribes to speaking track.
Watch: element UI `js/hang/src/watch/element.ts`	Reworks caption DOM to three nodes (left spacer, centered text, speaking icon) and binds the speaking indicator to `broadcast.audio.speaking.active`; toggling captions synchronizes speaking.
Container primitives `js/hang/src/container/bool.ts`	Adds `BoolProducer` and `BoolConsumer` to encode/decode boolean streams over track groups (write/next, clone/close semantics).
MoQ runtime `js/moq/src/group.ts`	Adds `async closed(): Promise<void>` to `GroupConsumer` to await underlying frames closure.
Minor import/type-only tweaks `js/hang/src/watch/video/index.ts`	Changes `Getter` import to a type-only import form.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant PublishAudio as Publish.Audio
  participant Speak as Speaking
  participant Cap as Captions
  participant Worklet as AudioWorklet("capture")
  participant SpeakW as Worker("speaking-worker")
  participant WhispW as Worker("captions-worker")
  participant Broadcast

  User->>PublishAudio: new Audio({ captions?, speaking? })
  PublishAudio->>Speak: construct (enabled)
  PublishAudio->>Cap: construct (enabled, transcribe?, ttl)
  Speak->>Worklet: create "capture" node, connect MediaStream
  Speak->>SpeakW: post { type: "init", port }
  Worklet-->>SpeakW: AudioFrame(channels[0]) (stream)
  SpeakW-->>Speak: { type: "speaking", speaking }
  Speak->>Broadcast: produce speaking frames (BoolProducer)
  Cap->>WhispW: post { type: "speaking", speaking } (when transcribe enabled)
  Worklet-->>WhispW: AudioFrame(ch0) (stream)
  WhispW-->>Cap: { type: "text", text } or { type: "error" }
  Cap->>Broadcast: update captions.text (cleared after TTL)
  PublishAudio->>Broadcast: publish Catalog.Audio (track + config + captions + speaking)

sequenceDiagram
  autonumber
  participant WatchAudio as Watch.Audio
  participant WatchSpeak as Watch.Speaking
  participant UI as Watch Element
  participant Broadcast

  WatchAudio->>WatchSpeak: new Speaking({ enabled })
  WatchSpeak->>Broadcast: subscribe(info.speaking.track)
  loop real-time
    Broadcast-->>WatchSpeak: frame (group open/close)
    WatchSpeak-->>UI: update active getter
    UI->>UI: render caption text + speaking icon
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Change some of the default latency/TTL values. #540 — Overlaps changes to captions TTL in js/hang/src/publish/audio/captions.ts (timing/clear behavior).
The ROLE extension is required for moq-rs. #545 — Touches js/hang/src/publish/audio/captions-worker.ts and related VAD/transcription code paths; likely related to the switch in transcription approach.

Poem

A whisker twitches, circuits hum,
I hop where workers, frames, and tokens run—
🗣️ voice becomes a track, a text that gleams,
Bool flags rise and fall inside my dreams.
I thump, I cheer—new signals stitch the streams.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 53716a7 and 0a27070.

📒 Files selected for processing (4)

js/hang/src/container/bool.ts (1 hunks)
js/hang/src/publish/audio/index.ts (9 hunks)
js/hang/src/watch/audio/index.ts (5 hunks)
js/hang/src/watch/video/index.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

js/hang/src/container/bool.ts
js/hang/src/publish/audio/index.ts
js/hang/src/watch/audio/index.ts

🧰 Additional context used

📓 Path-based instructions (2)

{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

js/hang/src/watch/video/index.ts

js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

js/hang/src/watch/video/index.ts

🔇 Additional comments (1)

js/hang/src/watch/video/index.ts (1)

2-2: Type-only import for Getter is spot-on and trims runtime surface

Switching to type Getter avoids pulling a runtime symbol, improves tree-shaking, and aligns with the hang-layer pattern of treating Getter as a pure type. No behavioral changes introduced.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch separate-vad

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

claude · 2025-08-25T21:53:18Z

Claude finished @kixelated's task —— View job

Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
Good error handling: Workers handle model loading failures and post error messages appropriately
Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
Buffer management in both workers uses proper circular buffering with processing flags
Hugging Face model loading is handled asynchronously with proper error paths
Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

Speaking track: Transmits boolean VAD results over network for visualizations
Captions track: Uses VAD state to optimize transcription timing
Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

js/hang/src/publish/audio/index.ts (1)
278-283: Include Speaking in close() to avoid worker/subscription leaks

Speaking owns a worker and a track; it should be closed with the rest.

Apply:
 close() {
   this.#signals.close();
   this.captions.close();
+  this.speaking.close();
   this.#track.close();
 }
js/hang/src/publish/audio/captions-worker.ts (1)
62-69: Buffer overflow risk: mixing bytes and elements in capacity check

this.#queued.byteLength is bytes; samples.length is elements. This check can miss capacity and cause a RangeError when expanding the view.

Use element counts against MAX_WHISPER_BUFFER:
-    if (this.#queued.byteLength + samples.length > this.#queued.buffer.byteLength) {
+    if (this.#queued.length + samples.length > MAX_WHISPER_BUFFER) {
       if (!this.flush()) {
         console.warn("buffer is full, dropping samples");
         return;
       }
     }
Also applies to: 75-77

🧹 Nitpick comments (18)

js/hang/src/publish/element.ts (4)
7-7: Expose independent speaking control on the element API

The PR goal is to separate VAD (speaking) from captions, but the custom element does not yet expose a standalone "speaking" attribute. Recommend adding it so integrators can enable VAD-only visualizations without captions.

Apply this diff to include the attribute:
-const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions"] as const;
+const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions", "speaking"] as const;
And add the following (outside the changed range) for proper handling:
// attributeChangedCallback(...)
} else if (name === "speaking") {
  this.speaking = newValue !== null;
} else {
// ...

// Add a dedicated property
get speaking(): boolean {
  return this.broadcast.audio.speaking.enabled.peek();
}
set speaking(speaking: boolean) {
  this.broadcast.audio.speaking.enabled.set(speaking);
}
129-136: Decouple captions() from speaking() to reflect “separate VAD and captions”

Setter currently toggles both captions and speaking together, which undermines their separation. Let captions control only captions. If you still want a convenience toggle, consider a new attribute like captions-and-vad.
 get captions(): boolean {
-  return this.broadcast.audio.captions.enabled.peek();
+  return this.broadcast.audio.captions.enabled.peek();
 }
 
 set captions(captions: boolean) {
-  this.broadcast.audio.captions.enabled.set(captions);
-  this.broadcast.audio.speaking.enabled.set(captions);
+  this.broadcast.audio.captions.enabled.set(captions);
 }
175-177: Show the speaking indicator even when captions are off (if VAD is enabled)

Currently, the captions row (which includes the speaking icon) is hidden when captions are disabled. For VAD-only use-cases, this hides the speaking indicator unnecessarily.
-const show = effect.get(this.broadcast.audio.captions.enabled);
+const show =
+  effect.get(this.broadcast.audio.captions.enabled) ||
+  effect.get(this.broadcast.audio.speaking.enabled);
191-197: Speaking icon logic is fine; ensure consistent falsey state

Looks good. If types ever allow undefined for active on the publish side, consider const speaking = !!effect.get(this.broadcast.audio.speaking.active); to normalize. Not blocking.
js/hang/src/catalog/audio/speaking.ts (1)

1-9: Minimal schema is appropriate

The SpeakingSchema mirroring a simple MoQ track is sufficient for cataloging a boolean VAD stream. Consider adding a brief doc comment that the encoding is 1 byte: 0=false, 1=true.

js/hang/src/watch/element.ts (1)

184-187: Confirm intent: captions toggle implicitly enables speaking (network/CPU coupling)

Tying speaking.enabled to the captions attribute may surprise users and increase bandwidth/CPU when someone only wants text. If intentional for now, fine; otherwise consider a separate attribute/prop (e.g., speaking) to decouple.

If you want to decouple without adding a new observed attribute yet, gate speaking on a second internal Signal defaulted from captions once (not kept in lockstep).

js/hang/src/publish/audio/index.ts (1)

183-191: Minor: consider making codec/bitrate configurable

Hard-coding opus and bitrate per channel is fine for now, but making these props (or deriving from constraints) will help experimentation across browsers and devices.

Also applies to: 231-232
js/hang/src/watch/audio/speaking.ts (1)
47-57: Optional: guard the async loop with try/catch and log once

Mirror the defensive pattern used elsewhere to avoid silent worker failures and to reset state on errors.

Apply:
-    effect.spawn(async (cancel) => {
-      for (;;) {
-        const frame = await Promise.race([sub.nextFrame(), cancel]);
-        if (!frame) break;
-        const active = frame.data.length > 0 && frame.data[0] !== 0;
-        this.#active.set(active);
-      }
-    });
+    effect.spawn(async (cancel) => {
+      try {
+        for (;;) {
+          const frame = await Promise.race([sub.nextFrame(), cancel]);
+          if (!frame) break;
+          const active = frame.data.length > 0 && frame.data[0] !== 0;
+          this.#active.set(active);
+        }
+      } catch (err) {
+        console.warn("speaking subscription error", err);
+      }
+    });
js/hang/src/publish/audio/captions-worker.ts (1)
97-113: Propagate worker errors instead of crashing on transcription failures

A try/catch here lets the host display/log issues without losing the worker.

Apply:
-  async #flush(buffer: Float32Array) {
-    const model = await this.#model;
-
-    // Do the expensive transcription.
-    const result = await model(buffer);
-    if (Array.isArray(result)) {
-      throw new Error("Expected a single result, got an array");
-    }
-
-    const text = result.text.trim();
-    if (text === "[BLANK_AUDIO]" || text === "") return;
-
-    postResult({
-      type: "text",
-      text,
-    });
-  }
+  async #flush(buffer: Float32Array) {
+    try {
+      const model = await this.#model;
+      const result = await model(buffer);
+      if (Array.isArray(result)) {
+        throw new Error("Expected a single result, got an array");
+      }
+      const text = result.text.trim();
+      if (text === "[BLANK_AUDIO]" || text === "") return;
+      postResult({ type: "text", text });
+    } catch (e: unknown) {
+      const message = e instanceof Error ? e.message : String(e);
+      postResult({ type: "error", message });
+    }
+  }
js/hang/src/publish/audio/speaking-worker.ts (2)
45-47: Fix comment to match logic (and new threshold check)

Comment says “3 in a row” but constant and logic use 8. Align the comment with VAD_SILENCE_PADDING to avoid confusion.
-	// Count the number of silence results, if we get 3 in a row then we're done.
+	// Count consecutive silence results; once we reach VAD_SILENCE_PADDING, we're done speaking.
123-130: Guard against duplicate init and validate message type

The worker creates a new Vad instance for every message. Future messages (or double inits) will rewire handlers unexpectedly.

Apply this diff to accept a single init and ignore duplicates:
+let vad: Vad | undefined;
+
-self.addEventListener("message", async (event: MessageEvent<Request>) => {
-	const message = event.data;
-
-	const vad = new Vad();
-	message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => {
-		vad.write(channels[0]);
-	};
-});
+self.addEventListener("message", (event: MessageEvent<Request>) => {
+	const message = event.data;
+	if (message.type !== "init") {
+		postResult({ type: "error", message: "unexpected message type" });
+		return;
+	}
+	if (vad) {
+		// Already initialized; ignore duplicates.
+		return;
+	}
+	vad = new Vad();
+	message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => {
+		vad!.write(channels[0]);
+	};
+});
js/hang/src/publish/audio/speaking.ts (3)
65-66: Clarify error source in logs

The error originates from the speaking worker, not “VAD worker.” Improves debuggability.
-				console.error("VAD worker error:", data.message);
+				console.error("speaking worker error:", data.message);
56-68: Add onerror handler for worker crashes

Unhandled worker errors won’t post a Result and leave active=true stale. Add a generic onerror trap.
 		const worker = new Worker(new URL("./speaking-worker", import.meta.url), { type: "module" });
 		effect.cleanup(() => worker.terminate());
 
 		// Handle messages from the worker
 		worker.onmessage = ({ data }: MessageEvent<Result>) => {
 			if (data.type === "speaking") {
 				// Use heuristics to determine if we've toggled speaking or not
 				this.active.set(data.speaking);
 			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+				console.error("speaking worker error:", data.message);
 				this.active.set(false);
 			}
 		};
+		worker.onerror = (ev) => {
+			console.error("speaking worker crash:", (ev as ErrorEvent).message ?? ev);
+			this.active.set(false);
+		};
74-107: Consider reusing capture to avoid multiple AudioContexts

Speaking and captions each spin up their own 16 kHz AudioContext and capture chain. If CPU is tight, consider sharing a single capture worklet and teeing frames to multiple workers via MessagePorts to avoid duplicate decode/resample overhead.
js/hang/src/publish/audio/captions.ts (4)
9-15: Unused CaptionsProps.transcribe

transcribe is introduced but unused. Either wire it to enable/disable transcription behavior or remove to avoid confusion.

If you intend it to gate worker setup, one minimal integration is:
 export type CaptionsProps = {
 	enabled?: boolean;
-	transcribe?: boolean;
+	transcribe?: boolean;
 	// Captions are cleared after this many milliseconds. (10s default)
 	ttl?: DOMHighResTimeStamp;
 };

 constructor(audio: Audio, props?: CaptionsProps) {
 	this.audio = audio;
 	this.#ttl = props?.ttl ?? 10000;
-	this.enabled = new Signal(props?.enabled ?? false);
+	this.enabled = new Signal((props?.enabled ?? false) && (props?.transcribe ?? true));
Or drop the prop until needed.

69-75: Clarify error source in logs

This is the captions worker, not VAD.
-			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+			} else if (data.type === "error") {
+				console.error("captions worker error:", data.message);
 				this.text.set(undefined);
 			}
65-76: Add onerror handler for worker crashes

Mirror speaking.ts: catch unhandled worker errors and clear text.
 		const worker = new Worker(new URL("./captions-worker", import.meta.url), { type: "module" });
 		effect.cleanup(() => worker.terminate());
 
 		// Handle messages from the worker
 		worker.onmessage = ({ data }: MessageEvent<Result>) => {
 			if (data.type === "text") {
 				this.text.set(data.text);
 			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+				console.error("captions worker error:", data.message);
 				this.text.set(undefined);
 			}
 		};
+		worker.onerror = (ev) => {
+			console.error("captions worker crash:", (ev as ErrorEvent).message ?? ev);
+			this.text.set(undefined);
+		};
117-124: Captions depend on speaking; consider a fallback mode

When speaking isn’t enabled, captions log a warning and never signal speaking to the worker. If you want captions to work independently, allow a fallback (e.g., assume speaking=true) or make the dependency explicit in the API.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30909c1 and d5c4e26.

📒 Files selected for processing (14)

js/hang/src/catalog/audio/captions.ts (1 hunks)
js/hang/src/catalog/audio/index.ts (2 hunks)
js/hang/src/catalog/audio/speaking.ts (1 hunks)
js/hang/src/catalog/index.ts (1 hunks)
js/hang/src/publish/audio/captions-worker.ts (4 hunks)
js/hang/src/publish/audio/captions-worklet.ts (0 hunks)
js/hang/src/publish/audio/captions.ts (4 hunks)
js/hang/src/publish/audio/index.ts (9 hunks)
js/hang/src/publish/audio/speaking-worker.ts (1 hunks)
js/hang/src/publish/audio/speaking.ts (1 hunks)
js/hang/src/publish/element.ts (4 hunks)
js/hang/src/watch/audio/index.ts (5 hunks)
js/hang/src/watch/audio/speaking.ts (1 hunks)
js/hang/src/watch/element.ts (2 hunks)

💤 Files with no reviewable changes (1)

js/hang/src/publish/audio/captions-worklet.ts

🧰 Additional context used

📓 Path-based instructions (2)

{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

js/hang/src/catalog/audio/speaking.ts
js/hang/src/catalog/audio/captions.ts
js/hang/src/watch/audio/speaking.ts
js/hang/src/publish/audio/speaking-worker.ts
js/hang/src/publish/audio/speaking.ts
js/hang/src/catalog/audio/index.ts
js/hang/src/watch/element.ts
js/hang/src/catalog/index.ts
js/hang/src/publish/element.ts
js/hang/src/publish/audio/index.ts
js/hang/src/publish/audio/captions-worker.ts
js/hang/src/watch/audio/index.ts
js/hang/src/publish/audio/captions.ts

js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

js/hang/src/catalog/audio/speaking.ts
js/hang/src/catalog/audio/captions.ts
js/hang/src/watch/audio/speaking.ts
js/hang/src/publish/audio/speaking-worker.ts
js/hang/src/publish/audio/speaking.ts
js/hang/src/catalog/audio/index.ts
js/hang/src/watch/element.ts
js/hang/src/catalog/index.ts
js/hang/src/publish/element.ts
js/hang/src/publish/audio/index.ts
js/hang/src/publish/audio/captions-worker.ts
js/hang/src/watch/audio/index.ts
js/hang/src/publish/audio/captions.ts

🧬 Code graph analysis (11)

js/hang/src/catalog/audio/speaking.ts (3)

js/hang/src/catalog/track.ts (1)

TrackSchema (4-7)

js/hang/src/publish/audio/speaking.ts (1)

Speaking (14-113)

js/hang/src/watch/audio/speaking.ts (1)

Speaking (9-62)

js/hang/src/watch/audio/speaking.ts (4)

js/signals/src/index.ts (3)

Getter (10-13)

Signal (19-76)

Effect (81-408)

js/moq/src/broadcast.ts (1)

BroadcastConsumer (122-180)

js/hang/src/catalog/audio/index.ts (1)

Audio (43-43)

js/hang/src/watch/audio/index.ts (3)

Audio (33-212)

effect (79-121)

effect (123-170)

js/hang/src/publish/audio/speaking-worker.ts (3)

js/hang/src/publish/audio/captions-worker.ts (5)

Request (4-4)

Init (6-10)

Result (17-17)

Speaking (12-15)

Error (24-27)

js/hang/src/publish/audio/speaking.ts (1)

Speaking (14-113)

js/hang/src/publish/audio/capture.ts (1)

AudioFrame (1-4)

js/hang/src/publish/audio/speaking.ts (5)

js/hang/src/watch/audio/speaking.ts (3)

SpeakingProps (5-7)

Speaking (9-62)

effect (32-57)

js/hang/src/publish/audio/captions-worker.ts (3)

Speaking (12-15)

Result (17-17)

Request (4-4)

js/hang/src/publish/audio/speaking-worker.ts (3)

Speaking (14-17)

Result (12-12)

Request (4-4)

js/hang/src/publish/audio/index.ts (5)

Audio (59-283)

effect (106-150)

effect (152-165)

effect (167-256)

effect (258-276)

js/hang/src/publish/audio/captions.ts (1)

effect (39-125)

js/hang/src/catalog/audio/index.ts (1)

js/hang/src/catalog/audio/speaking.ts (1)

SpeakingSchema (4-7)

js/hang/src/watch/element.ts (4)

js/hang/src/publish/element.ts (3)

captions (129-131)

captions (133-136)

captions (161-209)

js/hang/src/publish/audio/captions.ts (1)

effect (39-125)

js/hang/src/watch/audio/speaking.ts (1)

effect (32-57)

js/hang/src/watch/audio/captions.ts (1)

effect (32-57)

js/hang/src/publish/element.ts (7)

js/hang/src/watch/element.ts (3)

captions (180-182)

captions (184-187)

captions (230-274)

js/hang/src/publish/audio/captions.ts (1)

effect (39-125)

js/hang/src/publish/audio/index.ts (4)

effect (106-150)

effect (152-165)

effect (167-256)

effect (258-276)

js/hang/src/publish/audio/speaking.ts (1)

effect (32-108)

js/hang/src/watch/audio/speaking.ts (1)

effect (32-57)

js/hang/src/watch/audio/captions.ts (1)

effect (32-57)

js/hang/src/publish/audio/captions-worker.ts (1)

speaking (115-123)

js/hang/src/publish/audio/index.ts (2)

js/hang/src/publish/audio/speaking.ts (3)

SpeakingProps (9-11)

Speaking (14-113)

effect (32-108)

js/hang/src/publish/audio/captions.ts (1)

effect (39-125)

js/hang/src/publish/audio/captions-worker.ts (3)

js/hang/src/publish/audio/speaking-worker.ts (5)

Request (4-4)

Init (6-10)

Speaking (14-17)

Result (12-12)

Error (19-22)

js/hang/src/publish/audio/speaking.ts (1)

Speaking (14-113)

js/hang/src/publish/audio/capture.ts (1)

AudioFrame (1-4)

js/hang/src/watch/audio/index.ts (1)

js/hang/src/watch/audio/speaking.ts (2)

SpeakingProps (5-7)

Speaking (9-62)

js/hang/src/publish/audio/captions.ts (4)

js/hang/src/watch/audio/captions.ts (2)

CaptionsProps (5-7)

effect (32-57)

js/hang/src/publish/audio/index.ts (4)

effect (106-150)

effect (152-165)

effect (167-256)

effect (258-276)

js/hang/src/publish/audio/speaking.ts (1)

effect (32-108)

js/hang/src/publish/audio/captions-worker.ts (1)

speaking (115-123)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: claude-review
GitHub Check: Check

🔇 Additional comments (11)

js/hang/src/catalog/audio/captions.ts (1)

2-2: Import path realignment LGTM

Switching to ../track matches the catalog reorg and keeps schemas centralized.

js/hang/src/catalog/index.ts (1)

2-3: No duplicate or conflicting re-exports detected

After inspecting the re-export chain:

js/hang/src/catalog/index.ts exports:

* from "./audio" (which only exports AudioSchema, AudioConfigSchema, and their corresponding types)

* from "./audio/captions" (exports CaptionsSchema and its type)

* from "./audio/speaking" (exports SpeakingSchema and its type)

js/hang/src/catalog/audio/index.ts does not re-export from ./captions or ./speaking, so there’s no overlap in symbol names.

The explicit re-exports of captions and speaking each introduce unique identifiers (CaptionsSchema, Captions, SpeakingSchema, Speaking) that do not shadow or duplicate anything from the audio index.

No further action is required here.

js/hang/src/catalog/audio/index.ts (2)

2-6: Schema import path adjustments look correct

Moving u53Schema and TrackSchema to the parent paths keeps consistency with the catalog layout.

38-41: Optional speaking field is a clean extension

Adding speaking?: SpeakingSchema as optional maintains backward compatibility for existing consumers that don’t expect VAD. No issues.
js/hang/src/catalog/audio/speaking.ts (1)
1-9: Ensure speaking frames are detected by inspecting the first byte

The watch-side consumer currently treats any non-empty frame as “speaking,” but the producer encodes speaking/silence as a single byte (1/0). We need to update the check so that only frame.data[0] === 1 is considered active.

• In js/hang/src/watch/audio/speaking.ts (around line 52):
- const active = frame.data.length > 0;
+ const active = frame.data.length > 0 && frame.data[0] === 1;
This ensures that silence frames ([0]) no longer peg the UI to “speaking.”

Likely an incorrect or invalid review comment.
js/hang/src/watch/audio/index.ts (3)

11-12: Speaking track plumbing into watch.Audio props and surface — LGTM

Importing Speaking and exposing it on AudioProps as an optional nested prop is consistent with the Captions pattern and keeps the API coherent.

Also applies to: 23-25

49-50: Correct instantiation and lifecycle wiring for Speaking

Constructing Speaking with the same broadcast/info getters as Captions ensures it reacts to catalog changes. Early returns inside Speaking.#run gate work until enabled/info.speaking are present — good.

Also applies to: 69-70

206-207: Closing Speaking alongside Captions — LGTM

Including speaking.close() in Audio.close() prevents dangling subscriptions.

js/hang/src/publish/audio/index.ts (2)

16-17: Speaking support and config surface — LGTM

Importing/wiring Speaking mirrors Captions.

Adding a public config Getter via #config improves catalog assembly orderliness.

Also applies to: 52-53, 66-67, 75-77

258-276: Dedicated #runCatalog consolidates catalog assembly — LGTM

Building Catalog.Audio from track+config+captions+speaking in one place avoids partial states and racey consumers.

js/hang/src/publish/audio/captions-worker.ts (1)

131-137: Host-controlled speaking handoff — LGTM

Routing the speaking flag from the host into the worker simplifies roles and lets VAD be independently tuned or replaced.

coderabbitai · 2025-08-25T22:03:33Z

js/hang/src/publish/audio/speaking-worker.ts

+	#sr = new Tensor("int64", [SAMPLE_RATE], []);
+	#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
+	#speaking = false;


💡 Verification agent

🧩 Analysis chain

Confirm dtype for sr tensor

Transformers.js often expects int64 inputs to be provided as BigInt or int64-compatible arrays depending on the model wrapper. Please confirm that passing a number[] for an int64 Tensor works across browsers for this specific model.

🌐 Web query:

Does Transformers.js accept a plain number[] for an int64 Tensor input (e.g., sr: 16000) when running the ONNX Silero VAD model in the browser?

💡 Result:

Short answer: No — you must pass an int64 TypedArray (BigInt64Array) rather than a plain number[] when running ONNX int64 inputs in the browser. ONNX runtimes used by Transformers.js expect a BigInt64Array for int64 tensors; a plain number[] will not be treated as int64 and will cause a type error or wrong behavior [1][2].

Sources

transformers.js (discussion/issue): ONNX/wasm/web runtimes require BigInt64Array for int64 tensors (use BigInt64Array for int64 input) [1].

ONNX Runtime Web docs: int64 tensor inputs use BigInt64Array in JavaScript (not plain number[]) [2].

Use a BigInt64Array for the int64 “sr” Tensor
Transformers.js (and the underlying ONNX Web runtimes) require int64 inputs to be passed as a BigInt64Array rather than a plain number[]. A number[] will not be treated as int64 and will lead to errors or incorrect behavior

Please update the code in js/hang/src/publish/audio/speaking-worker.ts (around lines 41–43) accordingly:

- #sr = new Tensor("int64", [SAMPLE_RATE], []); + // Pass a BigInt64Array for int64 inputs; dims=[] for a scalar + #sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []); #state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]); #speaking = false;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#sr = new Tensor("int64", [SAMPLE_RATE], []);

#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);

#speaking = false;

// Pass a BigInt64Array for int64 inputs; dims=[] for a scalar

#sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []);

#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);

#speaking = false;

🤖 Prompt for AI Agents

In js/hang/src/publish/audio/speaking-worker.ts around lines 41 to 43, the int64 Tensor for sr is being constructed from a plain number array; replace that with a BigInt64Array containing BigInt(SAMPLE_RATE) and pass that to the Tensor constructor (e.g. new BigInt64Array([BigInt(SAMPLE_RATE)]) and new Tensor("int64", thatBigIntArray, [SAMPLE_RATE] or appropriate shape)); leave the state and speaking declarations unchanged.

coderabbitai · 2025-08-25T22:03:33Z

js/hang/src/publish/audio/speaking-worker.ts

+	write(samples: Float32Array) {
+		if (this.#next.byteLength >= this.#next.buffer.byteLength) {
+			if (!this.flush()) {
+				// Drop the sample if VAD is still processing.
+				return;
+			}
+		}
+
+		this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
+		this.#next.set(samples, this.#next.length - samples.length);
+
+		if (this.#next.byteLength === this.#next.buffer.byteLength) {
+			this.flush(); // don't care if it fails
+		}
+	}


🛠️ Refactor suggestion

Prevent chunk overflow and dropped samples in write()

Current logic can overshoot the fixed 512-sample buffer if a worklet emits larger frames; it also silently drops data while a flush is in-flight. Boundaries should be respected by chunking the input into the remaining capacity and flushing exactly at VAD_CHUNK_SIZE.

Apply this diff to make writes chunk-safe and flush at exact boundaries:

- write(samples: Float32Array) { - if (this.#next.byteLength >= this.#next.buffer.byteLength) { - if (!this.flush()) { - // Drop the sample if VAD is still processing. - return; - } - } - - this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length); - this.#next.set(samples, this.#next.length - samples.length); - - if (this.#next.byteLength === this.#next.buffer.byteLength) { - this.flush(); // don't care if it fails - } - } + write(samples: Float32Array) { + let offset = 0; + while (offset < samples.length) { + // If an inference is running, drop the remainder to bound latency. + if (this.#processing) return; + + const remaining = VAD_CHUNK_SIZE - this.#next.length; + if (remaining === 0) { + this.flush(); + continue; + } + + const toCopy = Math.min(remaining, samples.length - offset); + const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy); + next.set(samples.subarray(offset, offset + toCopy), this.#next.length); + this.#next = next; + offset += toCopy; + + if (this.#next.length === VAD_CHUNK_SIZE) { + this.flush(); // schedule inference + } + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

write(samples: Float32Array) {

if (this.#next.byteLength >= this.#next.buffer.byteLength) {

if (!this.flush()) {

// Drop the sample if VAD is still processing.

return;

}

}

this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);

this.#next.set(samples, this.#next.length - samples.length);

if (this.#next.byteLength === this.#next.buffer.byteLength) {

this.flush(); // don't care if it fails

}

}

write(samples: Float32Array) {

let offset = 0;

while (offset < samples.length) {

// If an inference is running, drop the remainder to bound latency.

if (this.#processing) return;

const remaining = VAD_CHUNK_SIZE - this.#next.length;

if (remaining === 0) {

this.flush();

continue;

}

const toCopy = Math.min(remaining, samples.length - offset);

const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy);

next.set(samples.subarray(offset, offset + toCopy), this.#next.length);

this.#next = next;

offset += toCopy;

if (this.#next.length === VAD_CHUNK_SIZE) {

this.flush(); // schedule inference

}

}

}

coderabbitai · 2025-08-25T22:03:33Z

js/hang/src/publish/audio/speaking-worker.ts

+	async #flush() {
+		const model = await this.#model;
+
+		const input = new Tensor("float32", this.#current, [1, this.#current.length]);
+		const result = await model({ input, sr: this.#sr, state: this.#state });
+		this.#state = result.stateN;
+
+		const isSpeech = result.output.data[0];
+		if (this.#speaking && isSpeech < 0.3) {
+			this.#silence++;
+
+			if (this.#silence > VAD_SILENCE_PADDING) {
+				this.#speaking = false;
+
+				postResult({
+					type: "speaking",
+					speaking: false,
+				});
+			}
+		} else if (!this.#speaking && isSpeech >= 0.1) {
+			this.#speaking = true;
+			this.#silence = 0;
+
+			postResult({
+				type: "speaking",
+				speaking: true,
+			});
+		}
+	}


⚠️ Potential issue

Handle model load/inference errors to avoid unhandled rejections

If model download or inference fails, #flush() will reject and produce unhandled promise rejections. Post an error Result and recover.

Apply this diff to add robust error handling:

- async #flush() { - const model = await this.#model; - - const input = new Tensor("float32", this.#current, [1, this.#current.length]); - const result = await model({ input, sr: this.#sr, state: this.#state }); - this.#state = result.stateN; - - const isSpeech = result.output.data[0]; + async #flush() { + let model: PreTrainedModel; + try { + model = await this.#model; + } catch (err) { + postResult({ + type: "error", + message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`, + }); + return; + } + try { + const input = new Tensor("float32", this.#current, [1, this.#current.length]); + const result = await model({ input, sr: this.#sr, state: this.#state }); + this.#state = result.stateN; + + const isSpeech = result.output.data[0]; if (this.#speaking && isSpeech < 0.3) { this.#silence++; - if (this.#silence > VAD_SILENCE_PADDING) { + if (this.#silence >= VAD_SILENCE_PADDING) { this.#speaking = false; postResult({ type: "speaking", speaking: false, }); } } else if (!this.#speaking && isSpeech >= 0.1) { this.#speaking = true; this.#silence = 0; postResult({ type: "speaking", speaking: true, }); } + } catch (err) { + postResult({ + type: "error", + message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`, + }); + } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async #flush() {

const model = await this.#model;

const input = new Tensor("float32", this.#current, [1, this.#current.length]);

const result = await model({ input, sr: this.#sr, state: this.#state });

this.#state = result.stateN;

const isSpeech = result.output.data[0];

if (this.#speaking && isSpeech < 0.3) {

this.#silence++;

if (this.#silence > VAD_SILENCE_PADDING) {

this.#speaking = false;

postResult({

type: "speaking",

speaking: false,

});

}

} else if (!this.#speaking && isSpeech >= 0.1) {

this.#speaking = true;

this.#silence = 0;

postResult({

type: "speaking",

speaking: true,

});

}

}

async #flush() {

let model: PreTrainedModel;

try {

model = await this.#model;

} catch (err) {

postResult({

type: "error",

message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`,

});

return;

}

try {

const input = new Tensor("float32", this.#current, [1, this.#current.length]);

const result = await model({ input, sr: this.#sr, state: this.#state });

this.#state = result.stateN;

const isSpeech = result.output.data[0];

if (this.#speaking && isSpeech < 0.3) {

this.#silence++;

if (this.#silence >= VAD_SILENCE_PADDING) {

this.#speaking = false;

postResult({

type: "speaking",

speaking: false,

});

}

} else if (!this.#speaking && isSpeech >= 0.1) {

this.#speaking = true;

this.#silence = 0;

postResult({

type: "speaking",

speaking: true,

});

}

} catch (err) {

postResult({

type: "error",

message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`,

});

}

}

🤖 Prompt for AI Agents

In js/hang/src/publish/audio/speaking-worker.ts around lines 92-120, #flush() currently awaits model load and runs inference without catching errors, which can cause unhandled promise rejections; wrap the model access and inference in a try/catch, and on error call postResult with an error result (e.g. { type: "error", message: String(err) }), reset/clear any transient state (like #current buffer and #state or incrementally safe flags) so the worker can continue, and ensure the method returns cleanly after handling the error to avoid propagating the rejection.

js/hang/src/publish/audio/speaking.ts

js/hang/src/watch/audio/speaking.ts

coderabbitai · 2025-08-25T22:03:34Z

js/hang/src/watch/element.ts

+			const leftSpacer = DOM.create("div", {
+				style: { width: "1.5em" },
+			});
+
+			const captionText = DOM.create("div", {
+				style: { textAlign: "center" },
+			});
+
+			const speakingIcon = DOM.create("div", {
+				style: { width: "1.5em" },
+			});
+
+			effect.effect((effect) => {
+				const text = effect.get(this.broadcast.audio.captions.text);
+				const speaking = effect.get(this.broadcast.audio.speaking.active);
+
+				captionText.textContent = text ?? "";
+				speakingIcon.textContent = speaking ? "🗣️" : " ";
+			});
+
+			captions.appendChild(leftSpacer);
+			captions.appendChild(captionText);
+			captions.appendChild(speakingIcon);

 			effect.cleanup(() => {
-				captions.textContent = "";
+				captions.removeChild(leftSpacer);
+				captions.removeChild(captionText);
+				captions.removeChild(speakingIcon);
 			});


⚠️ Potential issue

Captions row will stack vertically; container needs flex for left spacer + centered text + icon

With three block-level divs, they render vertically. Add display:flex on the captions container and let the middle node flex.

Apply this targeted diff for the changed block to make the middle node flex:

- const captionText = DOM.create("div", { - style: { textAlign: "center" }, - }); + const captionText = DOM.create("div", { + style: { textAlign: "center", flex: "1" }, + });

And update the container style (outside this hunk) to enable the horizontal layout:

// At lines 231-235 const captions = DOM.create("div", { style: { display: "flex", justifyContent: "space-around", gap: "16px", minHeight: "1lh", alignContent: "center", }, });

Optional: mark the speaking icon decorative to avoid verbosity for screen readers:

- const speakingIcon = DOM.create("div", { - style: { width: "1.5em" }, - }); + const speakingIcon = DOM.create("div", { + style: { width: "1.5em" }, + "aria-hidden": "true", + });

🤖 Prompt for AI Agents

In js/hang/src/watch/element.ts around lines 231-272, the captions row uses three block divs so they stack vertically; update the captions container (lines ~231-235) to use display: "flex" with horizontal alignment (e.g., display: "flex", alignItems: "center", gap: "16px", justifyContent: "space-between") and change the middle captionText node (lines 244-252) to allow it to expand horizontally by adding style: { flex: "1", textAlign: "center" } so the leftSpacer and speakingIcon keep fixed widths; optionally mark speakingIcon as decorative (aria-hidden="true") to reduce screen-reader verbosity.

Use the stream closed state to signal false.

claude · 2025-08-25T22:26:34Z

Claude finished @kixelated's task —— View job

Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
Good error handling: Workers handle model loading failures and post error messages appropriately
Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
Buffer management in both workers uses proper circular buffering with processing flags
Hugging Face model loading is handled asynchronously with proper error paths
Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

Speaking track: Transmits boolean VAD results over network for visualizations
Captions track: Uses VAD state to optimize transcription timing
Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

claude · 2025-08-25T22:32:49Z

Claude finished @kixelated's task —— View job

Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
Good error handling: Workers handle model loading failures and post error messages appropriately
Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
Buffer management in both workers uses proper circular buffering with processing flags
Hugging Face model loading is handled asynchronously with proper error paths
Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

Speaking track: Transmits boolean VAD results over network for visualizations
Captions track: Uses VAD state to optimize transcription timing
Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

js/hang/src/container/bool.ts (1)

12-22: Encoding via group open/close is solid and removes prior false/true framing ambiguity.

Using “group open” to mean true (with a single [0x01] frame) and “group closed” to mean false is a clean, backpressure-friendly protocol and avoids the earlier watch-side misread of 0x00 as active. Nice.

🧹 Nitpick comments (3)

js/hang/src/container/bool.ts (2)
3-3: Fix minor grammar in the doc comment.

"write a frame" → "writes a frame".

Apply:
-// Creates a track that write a frame on true, and closes the group on false.
+// Creates a track that writes a frame on true, and closes the group on false.
28-32: Close the open group before closing the track to guarantee a trailing “false” edge.

Closing the track first will implicitly close the group, but closing the group explicitly first guarantees a final false emission before track teardown. Low risk either way; this order is clearer.
-	close() {
-		this.track.close();
-		this.#group?.close();
-		this.#group = undefined;
-	}
+	close() {
+		this.#group?.close();
+		this.#group = undefined;
+		this.track.close();
+	}
js/hang/src/publish/audio/speaking.ts (1)
39-41: Reset Speaking catalog on cleanup to avoid stale metadata in Audio catalog.

When disabled, you remove the track from broadcast but leave this.catalog populated. Audio.#runCatalog reads speaking from this signal, so metadata can remain “present” after disable.
 		this.audio.broadcast.insertTrack(this.#bool.track.consume());
 		effect.cleanup(() => this.audio.broadcast.removeTrack(this.#bool.track.name));

 		const catalog: Catalog.Speaking = {
 			track: {
 				name: this.#bool.track.name,
 				priority: u8(this.#bool.track.priority),
 			},
 		};
-		effect.set(this.catalog, catalog);
+		effect.set(this.catalog, catalog);
+		effect.cleanup(() => this.catalog.set(undefined));
Also applies to: 42-49

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d5c4e26 and 53716a7.

📒 Files selected for processing (4)

js/hang/src/container/bool.ts (1 hunks)
js/hang/src/publish/audio/speaking.ts (1 hunks)
js/hang/src/watch/audio/speaking.ts (1 hunks)
js/moq/src/group.ts (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

js/hang/src/watch/audio/speaking.ts

🧰 Additional context used

📓 Path-based instructions (2)

js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

js/moq/src/group.ts
js/hang/src/container/bool.ts
js/hang/src/publish/audio/speaking.ts

{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

js/hang/src/container/bool.ts
js/hang/src/publish/audio/speaking.ts

🧬 Code graph analysis (2)

js/hang/src/container/bool.ts (2)

js/moq/src/track.ts (2)

TrackProducer (9-118)

TrackConsumer (125-279)

js/moq/src/group.ts (2)

GroupProducer (8-59)

GroupConsumer (66-120)

js/hang/src/publish/audio/speaking.ts (4)

js/hang/src/publish/audio/speaking-worker.ts (3)

Speaking (14-17)

Result (12-12)

Request (4-4)

js/hang/src/publish/audio/index.ts (5)

Audio (59-283)

effect (106-150)

effect (152-165)

effect (167-256)

effect (258-276)

js/hang/src/container/bool.ts (1)

BoolProducer (4-33)

js/hang/src/publish/audio/captions.ts (1)

effect (39-125)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Check

🔇 Additional comments (7)

js/moq/src/group.ts (2)

94-101: LGTM: closed() is a useful lifecycle primitive and matches WatchConsumer behavior.

Straightforward pass-through to the underlying frames.closed(); this aligns with the PR’s broader lifecycle cleanup direction.

94-101: The above will reveal how #frames is set up in group.ts and confirm it’s a WatchConsumer, so we can verify that its .closed() rejects on abort.
js/hang/src/container/bool.ts (1)
24-26: Handle BoolProducer.clone() API

The current clone() implementation returns a fresh BoolProducer with its own #group, leading to potential state desynchronization if multiple producers write to the same track. Our grep of the codebase shows that clone() on BoolProducer isn’t called anywhere in js/hang/src except in its own definition, so removing or altering it won’t break internal usage—but it is part of the public API and could impact consumers outside this repo.

Please choose one of the following approaches and verify that downstream code will remain correct:

• Remove clone() entirely
– Delete the method from js/hang/src/container/bool.ts.
– Bump the major version and call out the breaking change in the changelog.

• Refactor to share backing state
– Replace the private #group?: GroupProducer field with a shared #state: { group?: GroupProducer } object.
– Update the constructor and all usages of #group (in write(), close()) to use #state.group.
– Change clone() to pass the same #state into the new instance:
```diff
 // js/hang/src/container/bool.ts
-#group?: GroupProducer;
+#state: { group?: GroupProducer };

-constructor(track: TrackProducer) {
-  this.track = track;
-}
+constructor(track: TrackProducer, state: { group?: GroupProducer } = {}) {
+  this.track = track;
+  this.#state = state;
+}

 write(value: boolean) {
   if (value) {
-    if (this.#group) return;
-    this.#group = this.track.appendGroup();
-    this.#group.writeFrame(...);
+    if (this.#state.group) return;
+    this.#state.group = this.track.appendGroup();
+    this.#state.group.writeFrame(...);
   } else {
-    if (!this.#group) return;
-    this.#group.close();
-    this.#group = undefined;
+    if (!this.#state.group) return;
+    this.#state.group.close();
+    this.#state.group = undefined;
   }
 }

-clone() {
-  return new BoolProducer(this.track);
-}
+clone() {
+  return new BoolProducer(this.track, this.#state);
+}

 close() {
   this.track.close();
-  this.#group?.close();
-  this.#group = undefined;
+  this.#state.group?.close();
+  this.#state.group = undefined;
 }
```
• Deprecate first, then remove
– Mark clone() as deprecated in JSDoc.
– In the next major release, drop the method entirely.

jm: Please confirm which path you’d like to take and ensure any external consumers of BoolProducer.clone() are updated accordingly.
js/hang/src/publish/audio/speaking.ts (4)

50-55: Direct wiring of active → BoolProducer is clean and efficient.

The nested effect avoids track recreation and only emits on state edges. Good use of the container abstraction.

74-107: Audio graph setup looks correct; keep the bundler contract for CaptureWorklet stable.

The addModule(CaptureWorklet) pattern relies on the bundler returning a module URL via ?worker&url. Ensure this remains aligned with your build. No changes requested.

56-69: Hysteresis Already Present in VAD Worker
The speaking-worker.ts logic uses distinct on/off thresholds (≥0.1 to start speaking, <0.3 to stop) and waits for 8 consecutive silent chunks (~8 × 31 ms ≈ 248 ms) before emitting speaking: false, matching the suggested hold time. No additional debounce/hysteresis is needed.

25-31: BoolProducer reuse confirmed; no changes needed

I’ve verified that #bool is instantiated once as a class-level field and is never re-created in the #run method. Each enable toggle simply calls:

this.audio.broadcast.insertTrack(this.#bool.track.consume())

And on cleanup, this.audio.broadcast.removeTrack(this.#bool.track.name)

This ensures you reuse the same producer and track across enable/disable cycles without unnecessary churn.

coderabbitai · 2025-08-25T22:36:09Z

js/hang/src/container/bool.ts

+	async next(): Promise<boolean | undefined> {
+		for (;;) {
+			if (!this.#group) {
+				const group = await this.track.nextGroup();
+				if (!group) return undefined;
+
+				this.#group = group;
+				return true;
+			}
+
+			const group = await Promise.race([
+				this.track.nextGroup(),
+				this.#group.closed(),
+			])
+			if (group) {
+				this.#group = group;
+				continue;
+			}
+
+			this.#group.close();
+			this.#group = undefined;
+			return false;
+		}
+	}


🛠️ Refactor suggestion

Disambiguate Promise.race result to avoid treating end-of-track as “false”.

Promise.race between nextGroup() and closed() returns undefined for both “track ended” and “group closed” paths. Today both map to returning false. It’s better to handle the cases explicitly, so end-of-track behavior is intentional and robust to internal changes.

Apply:

- const group = await Promise.race([ - this.track.nextGroup(), - this.#group.closed(), - ]) - if (group) { - this.#group = group; - continue; - } - - this.#group.close(); - this.#group = undefined; - return false; + const result: + | { kind: "group"; group?: GroupConsumer } + | { kind: "closed" } = + await Promise.race([ + this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })), + this.#group.closed().then(() => ({ kind: "closed" as const })), + ]); + + if (result.kind === "group") { + if (result.group) { + this.#group = result.group; + continue; + } + // Track ended while current group still open. Emit a final false then let caller + // observe undefined on the next next() call. + this.#group.close(); + this.#group = undefined; + return false; + } + + // Current group closed → false edge. + this.#group.close(); + this.#group = undefined; + return false;

Optionally, guard against abort errors by converting closed() rejections into “closed” signals with .catch(() => ({ kind: "closed" as const })).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async next(): Promise<boolean | undefined> {

for (;;) {

if (!this.#group) {

const group = await this.track.nextGroup();

if (!group) return undefined;

this.#group = group;

return true;

}

const group = await Promise.race([

this.track.nextGroup(),

this.#group.closed(),

])

if (group) {

this.#group = group;

continue;

}

this.#group.close();

this.#group = undefined;

return false;

}

}

async next(): Promise<boolean | undefined> {

for (;;) {

if (!this.#group) {

const group = await this.track.nextGroup();

if (!group) return undefined;

this.#group = group;

return true;

}

const result:

| { kind: "group"; group?: GroupConsumer }

| { kind: "closed" } =

await Promise.race([

this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })),

this.#group.closed().then(() => ({ kind: "closed" as const })),

]);

if (result.kind === "group") {

if (result.group) {

this.#group = result.group;

continue;

}

// Track ended while current group still open. Emit a final false then let caller

// observe undefined on the next next() call.

this.#group.close();

this.#group = undefined;

return false;

}

// Current group closed → false edge.

this.#group.close();

this.#group = undefined;

return false;

}

}

🤖 Prompt for AI Agents

In js/hang/src/container/bool.ts around lines 43-66, the Promise.race result is ambiguous because both "track ended" (nextGroup() -> undefined) and "group closed" (closed() -> undefined) produce the same falsy value; change the race to return tagged results so you can distinguish sources (e.g. wrap nextGroup() to resolve to { kind: "next", group: value } and closed() to resolve to { kind: "closed" } and convert closed() rejections to the closed tag via .catch), then branch on the tag: if kind === "next" and group is undefined return undefined (end of track), if kind === "next" and group is a Group set this.#group and continue, if kind === "closed" call this.#group.close(), clear this.#group and return false. Ensure closed() rejections are caught and normalized to the closed tag.

Separate VAD and captions.

d5c4e26

Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Useful for more audio visualizations.

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

Make a bool container to half the number of streams created.

53716a7

Use the stream closed state to signal false.

kixelated added 2 commits August 25, 2025 15:30

Merge remote-tracking branch 'origin/main' into separate-vad

e5319bf

just fix

0a27070

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

kixelated merged commit 614d432 into main Aug 25, 2025
5 checks passed

kixelated deleted the separate-vad branch August 25, 2025 22:47

-	write(samples: Float32Array) {
-		if (this.#next.byteLength >= this.#next.buffer.byteLength) {
-			if (!this.flush()) {
-				// Drop the sample if VAD is still processing.
-				return;
-			}
-		}
-		this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
-		this.#next.set(samples, this.#next.length - samples.length);
-		if (this.#next.byteLength === this.#next.buffer.byteLength) {
-			this.flush(); // don't care if it fails
-		}
-	}
+	write(samples: Float32Array) {
+		let offset = 0;
+		while (offset < samples.length) {
+			// If an inference is running, drop the remainder to bound latency.
+			if (this.#processing) return;
+			const remaining = VAD_CHUNK_SIZE - this.#next.length;
+			if (remaining === 0) {
+				this.flush();
+				continue;
+			}
+			const toCopy = Math.min(remaining, samples.length - offset);
+			const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy);
+			next.set(samples.subarray(offset, offset + toCopy), this.#next.length);
+			this.#next = next;
+			offset += toCopy;
+			if (this.#next.length === VAD_CHUNK_SIZE) {
+				this.flush(); // schedule inference
+			}
+		}
+	}

Separate VAD and captions. #555

Separate VAD and captions. #555

Uh oh!

Conversation

kixelated commented Aug 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

claude bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete 👍

✅ Strengths

🔍 Minor Observations (not issues)

🏗️ Architecture

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

claude bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete 👍

✅ Strengths

🔍 Minor Observations (not issues)

🏗️ Architecture

Uh oh!

claude bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete 👍

✅ Strengths

🔍 Minor Observations (not issues)

🏗️ Architecture

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kixelated commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

claude bot commented Aug 25, 2025 •

edited

Loading

claude bot commented Aug 25, 2025 •

edited

Loading

claude bot commented Aug 25, 2025 •

edited

Loading