Skip to content

Conversation

@kixelated
Copy link
Collaborator

@kixelated kixelated commented Aug 25, 2025

Now we selectively transmit a "speaking" track over the network. It's just a boolean that toggles on/off based on the VAD results. Technically, this increases the number of audio copies when captions are enabled and speaking is not detected, but idc lul.

Useful for more audio visualizations.

Summary by CodeRabbit

  • New Features

    • Real-time speaking detection with downloadable speaking track and a speaking indicator next to captions.
    • Whisper-based live transcription streaming text results; captions auto-clear after a configurable TTL (default ~10s).
  • Breaking Changes

    • Removed the transcribe element attribute; use captions/speaking toggles instead.
    • Captions API/semantics updated (transcribe removed; TTL behavior/type changed).
  • Refactor

    • Audio pipeline reorganized: dedicated speaking detection and separate transcription flow; captions and speaking state now driven by shared broadcast signals.

Now we selectively transmit a "speaking" track over the network.
It's just a boolean that toggles on/off based on the VAD results.

Useful for more audio visualizations.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 25, 2025

Walkthrough

Adds a speaking track and VAD pipeline (Silero) with worklet/worker wiring, replaces VAD transcription with a Whisper-based captions worker, introduces BoolProducer/BoolConsumer primitives, adjusts catalog exports and schemas, and updates publish/watch element wiring and UI rendering.

Changes

Cohort / File(s) Summary
Catalog: audio schemas
js/hang/src/catalog/audio/index.ts, js/hang/src/catalog/audio/speaking.ts, js/hang/src/catalog/audio/captions.ts
Adds SpeakingSchema and an optional speaking field on AudioSchema; new speaking.ts module; adjusts TrackSchema import paths.
Catalog: root exports
js/hang/src/catalog/index.ts
Removes top-level ./captions export; adds ./audio/captions and ./audio/speaking exports.
Publish: captions pipeline & worker
js/hang/src/publish/audio/captions-worker.ts, js/hang/src/publish/audio/captions.ts, js/hang/src/publish/audio/captions-worklet.ts
Replaces VAD transcription internals with Whisper-based transcription in captions-worker.ts (request/result unions changed); removes captions-worklet.ts; captions.ts now uses a capture worklet, adds transcribe?: boolean, changes ttl type, removes internal speaking signal and pushes speaking state to the worker.
Publish: speaking pipeline & worker
js/hang/src/publish/audio/speaking.ts, js/hang/src/publish/audio/speaking-worker.ts
New Speaking feature: Speaking class wires AudioContext/worklet and a speaking worker; speaking-worker.ts implements Silero VAD and emits speaking true/false events.
Publish: audio integration
js/hang/src/publish/audio/index.ts
Integrates Speaking into Audio API/props; adds public config getter; centralizes catalog construction via an effect that includes config, captions, and speaking.
Publish: element API
js/hang/src/publish/element.ts
Removes transcribe attribute/state and related accessors; captions toggle now also toggles broadcast.audio.speaking.enabled; rendering reads broadcast signals.
Watch: audio integration
js/hang/src/watch/audio/index.ts, js/hang/src/watch/audio/speaking.ts
Adds watch-side Speaking controller and speaking prop; constructs/closes Speaking alongside Audio; exposes enabled/active state and subscribes to speaking track.
Watch: element UI
js/hang/src/watch/element.ts
Reworks caption DOM to three nodes (left spacer, centered text, speaking icon) and binds the speaking indicator to broadcast.audio.speaking.active; toggling captions synchronizes speaking.
Container primitives
js/hang/src/container/bool.ts
Adds BoolProducer and BoolConsumer to encode/decode boolean streams over track groups (write/next, clone/close semantics).
MoQ runtime
js/moq/src/group.ts
Adds async closed(): Promise<void> to GroupConsumer to await underlying frames closure.
Minor import/type-only tweaks
js/hang/src/watch/video/index.ts
Changes Getter import to a type-only import form.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant PublishAudio as Publish.Audio
  participant Speak as Speaking
  participant Cap as Captions
  participant Worklet as AudioWorklet("capture")
  participant SpeakW as Worker("speaking-worker")
  participant WhispW as Worker("captions-worker")
  participant Broadcast

  User->>PublishAudio: new Audio({ captions?, speaking? })
  PublishAudio->>Speak: construct (enabled)
  PublishAudio->>Cap: construct (enabled, transcribe?, ttl)
  Speak->>Worklet: create "capture" node, connect MediaStream
  Speak->>SpeakW: post { type: "init", port }
  Worklet-->>SpeakW: AudioFrame(channels[0]) (stream)
  SpeakW-->>Speak: { type: "speaking", speaking }
  Speak->>Broadcast: produce speaking frames (BoolProducer)
  Cap->>WhispW: post { type: "speaking", speaking } (when transcribe enabled)
  Worklet-->>WhispW: AudioFrame(ch0) (stream)
  WhispW-->>Cap: { type: "text", text } or { type: "error" }
  Cap->>Broadcast: update captions.text (cleared after TTL)
  PublishAudio->>Broadcast: publish Catalog.Audio (track + config + captions + speaking)
Loading
sequenceDiagram
  autonumber
  participant WatchAudio as Watch.Audio
  participant WatchSpeak as Watch.Speaking
  participant UI as Watch Element
  participant Broadcast

  WatchAudio->>WatchSpeak: new Speaking({ enabled })
  WatchSpeak->>Broadcast: subscribe(info.speaking.track)
  loop real-time
    Broadcast-->>WatchSpeak: frame (group open/close)
    WatchSpeak-->>UI: update active getter
    UI->>UI: render caption text + speaking icon
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

A whisker twitches, circuits hum,
I hop where workers, frames, and tokens run—
🗣️ voice becomes a track, a text that gleams,
Bool flags rise and fall inside my dreams.
I thump, I cheer—new signals stitch the streams.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 53716a7 and 0a27070.

📒 Files selected for processing (4)
  • js/hang/src/container/bool.ts (1 hunks)
  • js/hang/src/publish/audio/index.ts (9 hunks)
  • js/hang/src/watch/audio/index.ts (5 hunks)
  • js/hang/src/watch/video/index.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • js/hang/src/container/bool.ts
  • js/hang/src/publish/audio/index.ts
  • js/hang/src/watch/audio/index.ts
🧰 Additional context used
📓 Path-based instructions (2)
{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

  • js/hang/src/watch/video/index.ts
js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

  • js/hang/src/watch/video/index.ts
🔇 Additional comments (1)
js/hang/src/watch/video/index.ts (1)

2-2: Type-only import for Getter is spot-on and trims runtime surface

Switching to type Getter avoids pulling a runtime symbol, improves tree-shaking, and aligns with the hang-layer pattern of treating Getter as a pure type. No behavioral changes introduced.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch separate-vad

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@claude
Copy link
Contributor

claude bot commented Aug 25, 2025

Claude finished @kixelated's task —— View job


Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

  • Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
  • Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
  • Good error handling: Workers handle model loading failures and post error messages appropriately
  • Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
  • Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

  • VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
  • Buffer management in both workers uses proper circular buffering with processing flags
  • Hugging Face model loading is handled asynchronously with proper error paths
  • Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

  • Speaking track: Transmits boolean VAD results over network for visualizations
  • Captions track: Uses VAD state to optimize transcription timing
  • Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
js/hang/src/publish/audio/index.ts (1)

278-283: Include Speaking in close() to avoid worker/subscription leaks

Speaking owns a worker and a track; it should be closed with the rest.

Apply:

 close() {
   this.#signals.close();
   this.captions.close();
+  this.speaking.close();
   this.#track.close();
 }
js/hang/src/publish/audio/captions-worker.ts (1)

62-69: Buffer overflow risk: mixing bytes and elements in capacity check

this.#queued.byteLength is bytes; samples.length is elements. This check can miss capacity and cause a RangeError when expanding the view.

Use element counts against MAX_WHISPER_BUFFER:

-    if (this.#queued.byteLength + samples.length > this.#queued.buffer.byteLength) {
+    if (this.#queued.length + samples.length > MAX_WHISPER_BUFFER) {
       if (!this.flush()) {
         console.warn("buffer is full, dropping samples");
         return;
       }
     }

Also applies to: 75-77

🧹 Nitpick comments (18)
js/hang/src/publish/element.ts (4)

7-7: Expose independent speaking control on the element API

The PR goal is to separate VAD (speaking) from captions, but the custom element does not yet expose a standalone "speaking" attribute. Recommend adding it so integrators can enable VAD-only visualizations without captions.

Apply this diff to include the attribute:

-const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions"] as const;
+const OBSERVED = ["url", "name", "device", "audio", "video", "controls", "captions", "speaking"] as const;

And add the following (outside the changed range) for proper handling:

// attributeChangedCallback(...)
} else if (name === "speaking") {
  this.speaking = newValue !== null;
} else {
// ...

// Add a dedicated property
get speaking(): boolean {
  return this.broadcast.audio.speaking.enabled.peek();
}
set speaking(speaking: boolean) {
  this.broadcast.audio.speaking.enabled.set(speaking);
}

129-136: Decouple captions() from speaking() to reflect “separate VAD and captions”

Setter currently toggles both captions and speaking together, which undermines their separation. Let captions control only captions. If you still want a convenience toggle, consider a new attribute like captions-and-vad.

 get captions(): boolean {
-  return this.broadcast.audio.captions.enabled.peek();
+  return this.broadcast.audio.captions.enabled.peek();
 }
 
 set captions(captions: boolean) {
-  this.broadcast.audio.captions.enabled.set(captions);
-  this.broadcast.audio.speaking.enabled.set(captions);
+  this.broadcast.audio.captions.enabled.set(captions);
 }

175-177: Show the speaking indicator even when captions are off (if VAD is enabled)

Currently, the captions row (which includes the speaking icon) is hidden when captions are disabled. For VAD-only use-cases, this hides the speaking indicator unnecessarily.

-const show = effect.get(this.broadcast.audio.captions.enabled);
+const show =
+  effect.get(this.broadcast.audio.captions.enabled) ||
+  effect.get(this.broadcast.audio.speaking.enabled);

191-197: Speaking icon logic is fine; ensure consistent falsey state

Looks good. If types ever allow undefined for active on the publish side, consider const speaking = !!effect.get(this.broadcast.audio.speaking.active); to normalize. Not blocking.

js/hang/src/catalog/audio/speaking.ts (1)

1-9: Minimal schema is appropriate

The SpeakingSchema mirroring a simple MoQ track is sufficient for cataloging a boolean VAD stream. Consider adding a brief doc comment that the encoding is 1 byte: 0=false, 1=true.

js/hang/src/watch/element.ts (1)

184-187: Confirm intent: captions toggle implicitly enables speaking (network/CPU coupling)

Tying speaking.enabled to the captions attribute may surprise users and increase bandwidth/CPU when someone only wants text. If intentional for now, fine; otherwise consider a separate attribute/prop (e.g., speaking) to decouple.

If you want to decouple without adding a new observed attribute yet, gate speaking on a second internal Signal defaulted from captions once (not kept in lockstep).

js/hang/src/publish/audio/index.ts (1)

183-191: Minor: consider making codec/bitrate configurable

Hard-coding opus and bitrate per channel is fine for now, but making these props (or deriving from constraints) will help experimentation across browsers and devices.

Also applies to: 231-232

js/hang/src/watch/audio/speaking.ts (1)

47-57: Optional: guard the async loop with try/catch and log once

Mirror the defensive pattern used elsewhere to avoid silent worker failures and to reset state on errors.

Apply:

-    effect.spawn(async (cancel) => {
-      for (;;) {
-        const frame = await Promise.race([sub.nextFrame(), cancel]);
-        if (!frame) break;
-        const active = frame.data.length > 0 && frame.data[0] !== 0;
-        this.#active.set(active);
-      }
-    });
+    effect.spawn(async (cancel) => {
+      try {
+        for (;;) {
+          const frame = await Promise.race([sub.nextFrame(), cancel]);
+          if (!frame) break;
+          const active = frame.data.length > 0 && frame.data[0] !== 0;
+          this.#active.set(active);
+        }
+      } catch (err) {
+        console.warn("speaking subscription error", err);
+      }
+    });
js/hang/src/publish/audio/captions-worker.ts (1)

97-113: Propagate worker errors instead of crashing on transcription failures

A try/catch here lets the host display/log issues without losing the worker.

Apply:

-  async #flush(buffer: Float32Array) {
-    const model = await this.#model;
-
-    // Do the expensive transcription.
-    const result = await model(buffer);
-    if (Array.isArray(result)) {
-      throw new Error("Expected a single result, got an array");
-    }
-
-    const text = result.text.trim();
-    if (text === "[BLANK_AUDIO]" || text === "") return;
-
-    postResult({
-      type: "text",
-      text,
-    });
-  }
+  async #flush(buffer: Float32Array) {
+    try {
+      const model = await this.#model;
+      const result = await model(buffer);
+      if (Array.isArray(result)) {
+        throw new Error("Expected a single result, got an array");
+      }
+      const text = result.text.trim();
+      if (text === "[BLANK_AUDIO]" || text === "") return;
+      postResult({ type: "text", text });
+    } catch (e: unknown) {
+      const message = e instanceof Error ? e.message : String(e);
+      postResult({ type: "error", message });
+    }
+  }
js/hang/src/publish/audio/speaking-worker.ts (2)

45-47: Fix comment to match logic (and new threshold check)

Comment says “3 in a row” but constant and logic use 8. Align the comment with VAD_SILENCE_PADDING to avoid confusion.

-	// Count the number of silence results, if we get 3 in a row then we're done.
+	// Count consecutive silence results; once we reach VAD_SILENCE_PADDING, we're done speaking.

123-130: Guard against duplicate init and validate message type

The worker creates a new Vad instance for every message. Future messages (or double inits) will rewire handlers unexpectedly.

Apply this diff to accept a single init and ignore duplicates:

+let vad: Vad | undefined;
+
-self.addEventListener("message", async (event: MessageEvent<Request>) => {
-	const message = event.data;
-
-	const vad = new Vad();
-	message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => {
-		vad.write(channels[0]);
-	};
-});
+self.addEventListener("message", (event: MessageEvent<Request>) => {
+	const message = event.data;
+	if (message.type !== "init") {
+		postResult({ type: "error", message: "unexpected message type" });
+		return;
+	}
+	if (vad) {
+		// Already initialized; ignore duplicates.
+		return;
+	}
+	vad = new Vad();
+	message.worklet.onmessage = ({ data: { channels } }: MessageEvent<AudioFrame>) => {
+		vad!.write(channels[0]);
+	};
+});
js/hang/src/publish/audio/speaking.ts (3)

65-66: Clarify error source in logs

The error originates from the speaking worker, not “VAD worker.” Improves debuggability.

-				console.error("VAD worker error:", data.message);
+				console.error("speaking worker error:", data.message);

56-68: Add onerror handler for worker crashes

Unhandled worker errors won’t post a Result and leave active=true stale. Add a generic onerror trap.

 		const worker = new Worker(new URL("./speaking-worker", import.meta.url), { type: "module" });
 		effect.cleanup(() => worker.terminate());
 
 		// Handle messages from the worker
 		worker.onmessage = ({ data }: MessageEvent<Result>) => {
 			if (data.type === "speaking") {
 				// Use heuristics to determine if we've toggled speaking or not
 				this.active.set(data.speaking);
 			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+				console.error("speaking worker error:", data.message);
 				this.active.set(false);
 			}
 		};
+		worker.onerror = (ev) => {
+			console.error("speaking worker crash:", (ev as ErrorEvent).message ?? ev);
+			this.active.set(false);
+		};

74-107: Consider reusing capture to avoid multiple AudioContexts

Speaking and captions each spin up their own 16 kHz AudioContext and capture chain. If CPU is tight, consider sharing a single capture worklet and teeing frames to multiple workers via MessagePorts to avoid duplicate decode/resample overhead.

js/hang/src/publish/audio/captions.ts (4)

9-15: Unused CaptionsProps.transcribe

transcribe is introduced but unused. Either wire it to enable/disable transcription behavior or remove to avoid confusion.

If you intend it to gate worker setup, one minimal integration is:

 export type CaptionsProps = {
 	enabled?: boolean;
-	transcribe?: boolean;
+	transcribe?: boolean;
 	// Captions are cleared after this many milliseconds. (10s default)
 	ttl?: DOMHighResTimeStamp;
 };

 constructor(audio: Audio, props?: CaptionsProps) {
 	this.audio = audio;
 	this.#ttl = props?.ttl ?? 10000;
-	this.enabled = new Signal(props?.enabled ?? false);
+	this.enabled = new Signal((props?.enabled ?? false) && (props?.transcribe ?? true));

Or drop the prop until needed.


69-75: Clarify error source in logs

This is the captions worker, not VAD.

-			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+			} else if (data.type === "error") {
+				console.error("captions worker error:", data.message);
 				this.text.set(undefined);
 			}

65-76: Add onerror handler for worker crashes

Mirror speaking.ts: catch unhandled worker errors and clear text.

 		const worker = new Worker(new URL("./captions-worker", import.meta.url), { type: "module" });
 		effect.cleanup(() => worker.terminate());
 
 		// Handle messages from the worker
 		worker.onmessage = ({ data }: MessageEvent<Result>) => {
 			if (data.type === "text") {
 				this.text.set(data.text);
 			} else if (data.type === "error") {
-				console.error("VAD worker error:", data.message);
+				console.error("captions worker error:", data.message);
 				this.text.set(undefined);
 			}
 		};
+		worker.onerror = (ev) => {
+			console.error("captions worker crash:", (ev as ErrorEvent).message ?? ev);
+			this.text.set(undefined);
+		};

117-124: Captions depend on speaking; consider a fallback mode

When speaking isn’t enabled, captions log a warning and never signal speaking to the worker. If you want captions to work independently, allow a fallback (e.g., assume speaking=true) or make the dependency explicit in the API.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30909c1 and d5c4e26.

📒 Files selected for processing (14)
  • js/hang/src/catalog/audio/captions.ts (1 hunks)
  • js/hang/src/catalog/audio/index.ts (2 hunks)
  • js/hang/src/catalog/audio/speaking.ts (1 hunks)
  • js/hang/src/catalog/index.ts (1 hunks)
  • js/hang/src/publish/audio/captions-worker.ts (4 hunks)
  • js/hang/src/publish/audio/captions-worklet.ts (0 hunks)
  • js/hang/src/publish/audio/captions.ts (4 hunks)
  • js/hang/src/publish/audio/index.ts (9 hunks)
  • js/hang/src/publish/audio/speaking-worker.ts (1 hunks)
  • js/hang/src/publish/audio/speaking.ts (1 hunks)
  • js/hang/src/publish/element.ts (4 hunks)
  • js/hang/src/watch/audio/index.ts (5 hunks)
  • js/hang/src/watch/audio/speaking.ts (1 hunks)
  • js/hang/src/watch/element.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • js/hang/src/publish/audio/captions-worklet.ts
🧰 Additional context used
📓 Path-based instructions (2)
{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

  • js/hang/src/catalog/audio/speaking.ts
  • js/hang/src/catalog/audio/captions.ts
  • js/hang/src/watch/audio/speaking.ts
  • js/hang/src/publish/audio/speaking-worker.ts
  • js/hang/src/publish/audio/speaking.ts
  • js/hang/src/catalog/audio/index.ts
  • js/hang/src/watch/element.ts
  • js/hang/src/catalog/index.ts
  • js/hang/src/publish/element.ts
  • js/hang/src/publish/audio/index.ts
  • js/hang/src/publish/audio/captions-worker.ts
  • js/hang/src/watch/audio/index.ts
  • js/hang/src/publish/audio/captions.ts
js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

  • js/hang/src/catalog/audio/speaking.ts
  • js/hang/src/catalog/audio/captions.ts
  • js/hang/src/watch/audio/speaking.ts
  • js/hang/src/publish/audio/speaking-worker.ts
  • js/hang/src/publish/audio/speaking.ts
  • js/hang/src/catalog/audio/index.ts
  • js/hang/src/watch/element.ts
  • js/hang/src/catalog/index.ts
  • js/hang/src/publish/element.ts
  • js/hang/src/publish/audio/index.ts
  • js/hang/src/publish/audio/captions-worker.ts
  • js/hang/src/watch/audio/index.ts
  • js/hang/src/publish/audio/captions.ts
🧬 Code graph analysis (11)
js/hang/src/catalog/audio/speaking.ts (3)
js/hang/src/catalog/track.ts (1)
  • TrackSchema (4-7)
js/hang/src/publish/audio/speaking.ts (1)
  • Speaking (14-113)
js/hang/src/watch/audio/speaking.ts (1)
  • Speaking (9-62)
js/hang/src/watch/audio/speaking.ts (4)
js/signals/src/index.ts (3)
  • Getter (10-13)
  • Signal (19-76)
  • Effect (81-408)
js/moq/src/broadcast.ts (1)
  • BroadcastConsumer (122-180)
js/hang/src/catalog/audio/index.ts (1)
  • Audio (43-43)
js/hang/src/watch/audio/index.ts (3)
  • Audio (33-212)
  • effect (79-121)
  • effect (123-170)
js/hang/src/publish/audio/speaking-worker.ts (3)
js/hang/src/publish/audio/captions-worker.ts (5)
  • Request (4-4)
  • Init (6-10)
  • Result (17-17)
  • Speaking (12-15)
  • Error (24-27)
js/hang/src/publish/audio/speaking.ts (1)
  • Speaking (14-113)
js/hang/src/publish/audio/capture.ts (1)
  • AudioFrame (1-4)
js/hang/src/publish/audio/speaking.ts (5)
js/hang/src/watch/audio/speaking.ts (3)
  • SpeakingProps (5-7)
  • Speaking (9-62)
  • effect (32-57)
js/hang/src/publish/audio/captions-worker.ts (3)
  • Speaking (12-15)
  • Result (17-17)
  • Request (4-4)
js/hang/src/publish/audio/speaking-worker.ts (3)
  • Speaking (14-17)
  • Result (12-12)
  • Request (4-4)
js/hang/src/publish/audio/index.ts (5)
  • Audio (59-283)
  • effect (106-150)
  • effect (152-165)
  • effect (167-256)
  • effect (258-276)
js/hang/src/publish/audio/captions.ts (1)
  • effect (39-125)
js/hang/src/catalog/audio/index.ts (1)
js/hang/src/catalog/audio/speaking.ts (1)
  • SpeakingSchema (4-7)
js/hang/src/watch/element.ts (4)
js/hang/src/publish/element.ts (3)
  • captions (129-131)
  • captions (133-136)
  • captions (161-209)
js/hang/src/publish/audio/captions.ts (1)
  • effect (39-125)
js/hang/src/watch/audio/speaking.ts (1)
  • effect (32-57)
js/hang/src/watch/audio/captions.ts (1)
  • effect (32-57)
js/hang/src/publish/element.ts (7)
js/hang/src/watch/element.ts (3)
  • captions (180-182)
  • captions (184-187)
  • captions (230-274)
js/hang/src/publish/audio/captions.ts (1)
  • effect (39-125)
js/hang/src/publish/audio/index.ts (4)
  • effect (106-150)
  • effect (152-165)
  • effect (167-256)
  • effect (258-276)
js/hang/src/publish/audio/speaking.ts (1)
  • effect (32-108)
js/hang/src/watch/audio/speaking.ts (1)
  • effect (32-57)
js/hang/src/watch/audio/captions.ts (1)
  • effect (32-57)
js/hang/src/publish/audio/captions-worker.ts (1)
  • speaking (115-123)
js/hang/src/publish/audio/index.ts (2)
js/hang/src/publish/audio/speaking.ts (3)
  • SpeakingProps (9-11)
  • Speaking (14-113)
  • effect (32-108)
js/hang/src/publish/audio/captions.ts (1)
  • effect (39-125)
js/hang/src/publish/audio/captions-worker.ts (3)
js/hang/src/publish/audio/speaking-worker.ts (5)
  • Request (4-4)
  • Init (6-10)
  • Speaking (14-17)
  • Result (12-12)
  • Error (19-22)
js/hang/src/publish/audio/speaking.ts (1)
  • Speaking (14-113)
js/hang/src/publish/audio/capture.ts (1)
  • AudioFrame (1-4)
js/hang/src/watch/audio/index.ts (1)
js/hang/src/watch/audio/speaking.ts (2)
  • SpeakingProps (5-7)
  • Speaking (9-62)
js/hang/src/publish/audio/captions.ts (4)
js/hang/src/watch/audio/captions.ts (2)
  • CaptionsProps (5-7)
  • effect (32-57)
js/hang/src/publish/audio/index.ts (4)
  • effect (106-150)
  • effect (152-165)
  • effect (167-256)
  • effect (258-276)
js/hang/src/publish/audio/speaking.ts (1)
  • effect (32-108)
js/hang/src/publish/audio/captions-worker.ts (1)
  • speaking (115-123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: claude-review
  • GitHub Check: Check
🔇 Additional comments (11)
js/hang/src/catalog/audio/captions.ts (1)

2-2: Import path realignment LGTM

Switching to ../track matches the catalog reorg and keeps schemas centralized.

js/hang/src/catalog/index.ts (1)

2-3: No duplicate or conflicting re-exports detected

After inspecting the re-export chain:

  • js/hang/src/catalog/index.ts exports:
    • * from "./audio" (which only exports AudioSchema, AudioConfigSchema, and their corresponding types)
    • * from "./audio/captions" (exports CaptionsSchema and its type)
    • * from "./audio/speaking" (exports SpeakingSchema and its type)
  • js/hang/src/catalog/audio/index.ts does not re-export from ./captions or ./speaking, so there’s no overlap in symbol names.
  • The explicit re-exports of captions and speaking each introduce unique identifiers (CaptionsSchema, Captions, SpeakingSchema, Speaking) that do not shadow or duplicate anything from the audio index.

No further action is required here.

js/hang/src/catalog/audio/index.ts (2)

2-6: Schema import path adjustments look correct

Moving u53Schema and TrackSchema to the parent paths keeps consistency with the catalog layout.


38-41: Optional speaking field is a clean extension

Adding speaking?: SpeakingSchema as optional maintains backward compatibility for existing consumers that don’t expect VAD. No issues.

js/hang/src/catalog/audio/speaking.ts (1)

1-9: Ensure speaking frames are detected by inspecting the first byte

The watch-side consumer currently treats any non-empty frame as “speaking,” but the producer encodes speaking/silence as a single byte (1/0). We need to update the check so that only frame.data[0] === 1 is considered active.

• In js/hang/src/watch/audio/speaking.ts (around line 52):

- const active = frame.data.length > 0;
+ const active = frame.data.length > 0 && frame.data[0] === 1;

This ensures that silence frames ([0]) no longer peg the UI to “speaking.”

Likely an incorrect or invalid review comment.

js/hang/src/watch/audio/index.ts (3)

11-12: Speaking track plumbing into watch.Audio props and surface — LGTM

Importing Speaking and exposing it on AudioProps as an optional nested prop is consistent with the Captions pattern and keeps the API coherent.

Also applies to: 23-25


49-50: Correct instantiation and lifecycle wiring for Speaking

Constructing Speaking with the same broadcast/info getters as Captions ensures it reacts to catalog changes. Early returns inside Speaking.#run gate work until enabled/info.speaking are present — good.

Also applies to: 69-70


206-207: Closing Speaking alongside Captions — LGTM

Including speaking.close() in Audio.close() prevents dangling subscriptions.

js/hang/src/publish/audio/index.ts (2)

16-17: Speaking support and config surface — LGTM

  • Importing/wiring Speaking mirrors Captions.
  • Adding a public config Getter via #config improves catalog assembly orderliness.

Also applies to: 52-53, 66-67, 75-77


258-276: Dedicated #runCatalog consolidates catalog assembly — LGTM

Building Catalog.Audio from track+config+captions+speaking in one place avoids partial states and racey consumers.

js/hang/src/publish/audio/captions-worker.ts (1)

131-137: Host-controlled speaking handoff — LGTM

Routing the speaking flag from the host into the worker simplifies roles and lets VAD be independently tuned or replaced.

Comment on lines +41 to +43
#sr = new Tensor("int64", [SAMPLE_RATE], []);
#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
#speaking = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm dtype for sr tensor

Transformers.js often expects int64 inputs to be provided as BigInt or int64-compatible arrays depending on the model wrapper. Please confirm that passing a number[] for an int64 Tensor works across browsers for this specific model.


🌐 Web query:

Does Transformers.js accept a plain number[] for an int64 Tensor input (e.g., sr: 16000) when running the ONNX Silero VAD model in the browser?

💡 Result:

Short answer: No — you must pass an int64 TypedArray (BigInt64Array) rather than a plain number[] when running ONNX int64 inputs in the browser. ONNX runtimes used by Transformers.js expect a BigInt64Array for int64 tensors; a plain number[] will not be treated as int64 and will cause a type error or wrong behavior [1][2].

Sources

  • transformers.js (discussion/issue): ONNX/wasm/web runtimes require BigInt64Array for int64 tensors (use BigInt64Array for int64 input) [1].
  • ONNX Runtime Web docs: int64 tensor inputs use BigInt64Array in JavaScript (not plain number[]) [2].

Use a BigInt64Array for the int64 “sr” Tensor
Transformers.js (and the underlying ONNX Web runtimes) require int64 inputs to be passed as a BigInt64Array rather than a plain number[]. A number[] will not be treated as int64 and will lead to errors or incorrect behavior

Please update the code in js/hang/src/publish/audio/speaking-worker.ts (around lines 41–43) accordingly:

-   #sr = new Tensor("int64", [SAMPLE_RATE], []);
+   // Pass a BigInt64Array for int64 inputs; dims=[] for a scalar
+   #sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []);
    #state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
    #speaking = false;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#sr = new Tensor("int64", [SAMPLE_RATE], []);
#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
#speaking = false;
// Pass a BigInt64Array for int64 inputs; dims=[] for a scalar
#sr = new Tensor("int64", new BigInt64Array([BigInt(SAMPLE_RATE)]), []);
#state = new Tensor("float32", new Float32Array(2 * 1 * 128), [2, 1, 128]);
#speaking = false;
🤖 Prompt for AI Agents
In js/hang/src/publish/audio/speaking-worker.ts around lines 41 to 43, the int64
Tensor for sr is being constructed from a plain number array; replace that with
a BigInt64Array containing BigInt(SAMPLE_RATE) and pass that to the Tensor
constructor (e.g. new BigInt64Array([BigInt(SAMPLE_RATE)]) and new
Tensor("int64", thatBigIntArray, [SAMPLE_RATE] or appropriate shape)); leave the
state and speaking declarations unchanged.

Comment on lines +58 to +72
write(samples: Float32Array) {
if (this.#next.byteLength >= this.#next.buffer.byteLength) {
if (!this.flush()) {
// Drop the sample if VAD is still processing.
return;
}
}

this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
this.#next.set(samples, this.#next.length - samples.length);

if (this.#next.byteLength === this.#next.buffer.byteLength) {
this.flush(); // don't care if it fails
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Prevent chunk overflow and dropped samples in write()

Current logic can overshoot the fixed 512-sample buffer if a worklet emits larger frames; it also silently drops data while a flush is in-flight. Boundaries should be respected by chunking the input into the remaining capacity and flushing exactly at VAD_CHUNK_SIZE.

Apply this diff to make writes chunk-safe and flush at exact boundaries:

-	write(samples: Float32Array) {
-		if (this.#next.byteLength >= this.#next.buffer.byteLength) {
-			if (!this.flush()) {
-				// Drop the sample if VAD is still processing.
-				return;
-			}
-		}
-
-		this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
-		this.#next.set(samples, this.#next.length - samples.length);
-
-		if (this.#next.byteLength === this.#next.buffer.byteLength) {
-			this.flush(); // don't care if it fails
-		}
-	}
+	write(samples: Float32Array) {
+		let offset = 0;
+		while (offset < samples.length) {
+			// If an inference is running, drop the remainder to bound latency.
+			if (this.#processing) return;
+
+			const remaining = VAD_CHUNK_SIZE - this.#next.length;
+			if (remaining === 0) {
+				this.flush();
+				continue;
+			}
+
+			const toCopy = Math.min(remaining, samples.length - offset);
+			const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy);
+			next.set(samples.subarray(offset, offset + toCopy), this.#next.length);
+			this.#next = next;
+			offset += toCopy;
+
+			if (this.#next.length === VAD_CHUNK_SIZE) {
+				this.flush(); // schedule inference
+			}
+		}
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
write(samples: Float32Array) {
if (this.#next.byteLength >= this.#next.buffer.byteLength) {
if (!this.flush()) {
// Drop the sample if VAD is still processing.
return;
}
}
this.#next = new Float32Array(this.#next.buffer, 0, this.#next.length + samples.length);
this.#next.set(samples, this.#next.length - samples.length);
if (this.#next.byteLength === this.#next.buffer.byteLength) {
this.flush(); // don't care if it fails
}
}
write(samples: Float32Array) {
let offset = 0;
while (offset < samples.length) {
// If an inference is running, drop the remainder to bound latency.
if (this.#processing) return;
const remaining = VAD_CHUNK_SIZE - this.#next.length;
if (remaining === 0) {
this.flush();
continue;
}
const toCopy = Math.min(remaining, samples.length - offset);
const next = new Float32Array(this.#next.buffer, 0, this.#next.length + toCopy);
next.set(samples.subarray(offset, offset + toCopy), this.#next.length);
this.#next = next;
offset += toCopy;
if (this.#next.length === VAD_CHUNK_SIZE) {
this.flush(); // schedule inference
}
}
}

Comment on lines +92 to +120
async #flush() {
const model = await this.#model;

const input = new Tensor("float32", this.#current, [1, this.#current.length]);
const result = await model({ input, sr: this.#sr, state: this.#state });
this.#state = result.stateN;

const isSpeech = result.output.data[0];
if (this.#speaking && isSpeech < 0.3) {
this.#silence++;

if (this.#silence > VAD_SILENCE_PADDING) {
this.#speaking = false;

postResult({
type: "speaking",
speaking: false,
});
}
} else if (!this.#speaking && isSpeech >= 0.1) {
this.#speaking = true;
this.#silence = 0;

postResult({
type: "speaking",
speaking: true,
});
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle model load/inference errors to avoid unhandled rejections

If model download or inference fails, #flush() will reject and produce unhandled promise rejections. Post an error Result and recover.

Apply this diff to add robust error handling:

-	async #flush() {
-		const model = await this.#model;
-
-		const input = new Tensor("float32", this.#current, [1, this.#current.length]);
-		const result = await model({ input, sr: this.#sr, state: this.#state });
-		this.#state = result.stateN;
-
-		const isSpeech = result.output.data[0];
+	async #flush() {
+		let model: PreTrainedModel;
+		try {
+			model = await this.#model;
+		} catch (err) {
+			postResult({
+				type: "error",
+				message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`,
+			});
+			return;
+		}
+		try {
+			const input = new Tensor("float32", this.#current, [1, this.#current.length]);
+			const result = await model({ input, sr: this.#sr, state: this.#state });
+			this.#state = result.stateN;
+
+			const isSpeech = result.output.data[0];
 			if (this.#speaking && isSpeech < 0.3) {
 				this.#silence++;
 
-			if (this.#silence > VAD_SILENCE_PADDING) {
+			if (this.#silence >= VAD_SILENCE_PADDING) {
 				this.#speaking = false;
 
 				postResult({
 					type: "speaking",
 					speaking: false,
 				});
 			}
 		} else if (!this.#speaking && isSpeech >= 0.1) {
 			this.#speaking = true;
 			this.#silence = 0;
 
 			postResult({
 				type: "speaking",
 				speaking: true,
 			});
 		}
+		} catch (err) {
+			postResult({
+				type: "error",
+				message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`,
+			});
+		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async #flush() {
const model = await this.#model;
const input = new Tensor("float32", this.#current, [1, this.#current.length]);
const result = await model({ input, sr: this.#sr, state: this.#state });
this.#state = result.stateN;
const isSpeech = result.output.data[0];
if (this.#speaking && isSpeech < 0.3) {
this.#silence++;
if (this.#silence > VAD_SILENCE_PADDING) {
this.#speaking = false;
postResult({
type: "speaking",
speaking: false,
});
}
} else if (!this.#speaking && isSpeech >= 0.1) {
this.#speaking = true;
this.#silence = 0;
postResult({
type: "speaking",
speaking: true,
});
}
}
async #flush() {
let model: PreTrainedModel;
try {
model = await this.#model;
} catch (err) {
postResult({
type: "error",
message: `VAD model load failed: ${err instanceof Error ? err.message : String(err)}`,
});
return;
}
try {
const input = new Tensor("float32", this.#current, [1, this.#current.length]);
const result = await model({ input, sr: this.#sr, state: this.#state });
this.#state = result.stateN;
const isSpeech = result.output.data[0];
if (this.#speaking && isSpeech < 0.3) {
this.#silence++;
if (this.#silence >= VAD_SILENCE_PADDING) {
this.#speaking = false;
postResult({
type: "speaking",
speaking: false,
});
}
} else if (!this.#speaking && isSpeech >= 0.1) {
this.#speaking = true;
this.#silence = 0;
postResult({
type: "speaking",
speaking: true,
});
}
} catch (err) {
postResult({
type: "error",
message: `VAD inference failed: ${err instanceof Error ? err.message : String(err)}`,
});
}
}
🤖 Prompt for AI Agents
In js/hang/src/publish/audio/speaking-worker.ts around lines 92-120, #flush()
currently awaits model load and runs inference without catching errors, which
can cause unhandled promise rejections; wrap the model access and inference in a
try/catch, and on error call postResult with an error result (e.g. { type:
"error", message: String(err) }), reset/clear any transient state (like #current
buffer and #state or incrementally safe flags) so the worker can continue, and
ensure the method returns cleanly after handling the error to avoid propagating
the rejection.

Comment on lines +244 to 272
const leftSpacer = DOM.create("div", {
style: { width: "1.5em" },
});

const captionText = DOM.create("div", {
style: { textAlign: "center" },
});

const speakingIcon = DOM.create("div", {
style: { width: "1.5em" },
});

effect.effect((effect) => {
const text = effect.get(this.broadcast.audio.captions.text);
const speaking = effect.get(this.broadcast.audio.speaking.active);

captionText.textContent = text ?? "";
speakingIcon.textContent = speaking ? "🗣️" : " ";
});

captions.appendChild(leftSpacer);
captions.appendChild(captionText);
captions.appendChild(speakingIcon);

effect.cleanup(() => {
captions.textContent = "";
captions.removeChild(leftSpacer);
captions.removeChild(captionText);
captions.removeChild(speakingIcon);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Captions row will stack vertically; container needs flex for left spacer + centered text + icon

With three block-level divs, they render vertically. Add display:flex on the captions container and let the middle node flex.

Apply this targeted diff for the changed block to make the middle node flex:

-      const captionText = DOM.create("div", {
-        style: { textAlign: "center" },
-      });
+      const captionText = DOM.create("div", {
+        style: { textAlign: "center", flex: "1" },
+      });

And update the container style (outside this hunk) to enable the horizontal layout:

// At lines 231-235
const captions = DOM.create("div", {
  style: {
    display: "flex",
    justifyContent: "space-around",
    gap: "16px",
    minHeight: "1lh",
    alignContent: "center",
  },
});

Optional: mark the speaking icon decorative to avoid verbosity for screen readers:

-      const speakingIcon = DOM.create("div", {
-        style: { width: "1.5em" },
-      });
+      const speakingIcon = DOM.create("div", {
+        style: { width: "1.5em" },
+        "aria-hidden": "true",
+      });
🤖 Prompt for AI Agents
In js/hang/src/watch/element.ts around lines 231-272, the captions row uses
three block divs so they stack vertically; update the captions container (lines
~231-235) to use display: "flex" with horizontal alignment (e.g., display:
"flex", alignItems: "center", gap: "16px", justifyContent: "space-between") and
change the middle captionText node (lines 244-252) to allow it to expand
horizontally by adding style: { flex: "1", textAlign: "center" } so the
leftSpacer and speakingIcon keep fixed widths; optionally mark speakingIcon as
decorative (aria-hidden="true") to reduce screen-reader verbosity.

Use the stream closed state to signal false.
@claude
Copy link
Contributor

claude bot commented Aug 25, 2025

Claude finished @kixelated's task —— View job


Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

  • Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
  • Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
  • Good error handling: Workers handle model loading failures and post error messages appropriately
  • Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
  • Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

  • VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
  • Buffer management in both workers uses proper circular buffering with processing flags
  • Hugging Face model loading is handled asynchronously with proper error paths
  • Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

  • Speaking track: Transmits boolean VAD results over network for visualizations
  • Captions track: Uses VAD state to optimize transcription timing
  • Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

@claude
Copy link
Contributor

claude bot commented Aug 25, 2025

Claude finished @kixelated's task —— View job


Code Review Complete 👍

I've thoroughly reviewed this PR separating VAD and captions functionality. The code is well-structured and I found no significant bugs or security issues. Here are my findings:

✅ Strengths

  • Clean separation of concerns: VAD logic properly extracted into dedicated speaking-worker.ts and Speaking class
  • Proper resource management: All workers, audio contexts, and worklets have appropriate cleanup via effect.cleanup()
  • Good error handling: Workers handle model loading failures and post error messages appropriately
  • Secure audio processing: Uses standard Web Audio API and WebCodecs - no direct buffer manipulation vulnerabilities
  • Type safety: Strong TypeScript typing throughout with proper interfaces and schemas

🔍 Minor Observations (not issues)

  • VAD thresholds are well-tuned (0.1 for speech detection, 0.3 for silence, 8-chunk padding)
  • Buffer management in both workers uses proper circular buffering with processing flags
  • Hugging Face model loading is handled asynchronously with proper error paths
  • Audio data transfer uses transferable objects appropriately to avoid copies

🏗️ Architecture

The separation creates a cleaner architecture where:

  • Speaking track: Transmits boolean VAD results over network for visualizations
  • Captions track: Uses VAD state to optimize transcription timing
  • Both can be enabled independently

Security Assessment: No vulnerabilities found. The code follows secure patterns for worker communication, audio processing, and resource management.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
js/hang/src/container/bool.ts (1)

12-22: Encoding via group open/close is solid and removes prior false/true framing ambiguity.

Using “group open” to mean true (with a single [0x01] frame) and “group closed” to mean false is a clean, backpressure-friendly protocol and avoids the earlier watch-side misread of 0x00 as active. Nice.

🧹 Nitpick comments (3)
js/hang/src/container/bool.ts (2)

3-3: Fix minor grammar in the doc comment.

"write a frame" → "writes a frame".

Apply:

-// Creates a track that write a frame on true, and closes the group on false.
+// Creates a track that writes a frame on true, and closes the group on false.

28-32: Close the open group before closing the track to guarantee a trailing “false” edge.

Closing the track first will implicitly close the group, but closing the group explicitly first guarantees a final false emission before track teardown. Low risk either way; this order is clearer.

-	close() {
-		this.track.close();
-		this.#group?.close();
-		this.#group = undefined;
-	}
+	close() {
+		this.#group?.close();
+		this.#group = undefined;
+		this.track.close();
+	}
js/hang/src/publish/audio/speaking.ts (1)

39-41: Reset Speaking catalog on cleanup to avoid stale metadata in Audio catalog.

When disabled, you remove the track from broadcast but leave this.catalog populated. Audio.#runCatalog reads speaking from this signal, so metadata can remain “present” after disable.

 		this.audio.broadcast.insertTrack(this.#bool.track.consume());
 		effect.cleanup(() => this.audio.broadcast.removeTrack(this.#bool.track.name));

 		const catalog: Catalog.Speaking = {
 			track: {
 				name: this.#bool.track.name,
 				priority: u8(this.#bool.track.priority),
 			},
 		};
-		effect.set(this.catalog, catalog);
+		effect.set(this.catalog, catalog);
+		effect.cleanup(() => this.catalog.set(undefined));

Also applies to: 42-49

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d5c4e26 and 53716a7.

📒 Files selected for processing (4)
  • js/hang/src/container/bool.ts (1 hunks)
  • js/hang/src/publish/audio/speaking.ts (1 hunks)
  • js/hang/src/watch/audio/speaking.ts (1 hunks)
  • js/moq/src/group.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • js/hang/src/watch/audio/speaking.ts
🧰 Additional context used
📓 Path-based instructions (2)
js/**

📄 CodeRabbit inference engine (CLAUDE.md)

Place JS/TS packages under the js/ directory

Files:

  • js/moq/src/group.ts
  • js/hang/src/container/bool.ts
  • js/hang/src/publish/audio/speaking.ts
{rs,js}/hang/**

📄 CodeRabbit inference engine (CLAUDE.md)

All media logic belongs in the hang layer (rs/hang and js/hang)

Files:

  • js/hang/src/container/bool.ts
  • js/hang/src/publish/audio/speaking.ts
🧬 Code graph analysis (2)
js/hang/src/container/bool.ts (2)
js/moq/src/track.ts (2)
  • TrackProducer (9-118)
  • TrackConsumer (125-279)
js/moq/src/group.ts (2)
  • GroupProducer (8-59)
  • GroupConsumer (66-120)
js/hang/src/publish/audio/speaking.ts (4)
js/hang/src/publish/audio/speaking-worker.ts (3)
  • Speaking (14-17)
  • Result (12-12)
  • Request (4-4)
js/hang/src/publish/audio/index.ts (5)
  • Audio (59-283)
  • effect (106-150)
  • effect (152-165)
  • effect (167-256)
  • effect (258-276)
js/hang/src/container/bool.ts (1)
  • BoolProducer (4-33)
js/hang/src/publish/audio/captions.ts (1)
  • effect (39-125)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Check
🔇 Additional comments (7)
js/moq/src/group.ts (2)

94-101: LGTM: closed() is a useful lifecycle primitive and matches WatchConsumer behavior.

Straightforward pass-through to the underlying frames.closed(); this aligns with the PR’s broader lifecycle cleanup direction.


94-101: The above will reveal how #frames is set up in group.ts and confirm it’s a WatchConsumer, so we can verify that its .closed() rejects on abort.

js/hang/src/container/bool.ts (1)

24-26: Handle BoolProducer.clone() API

The current clone() implementation returns a fresh BoolProducer with its own #group, leading to potential state desynchronization if multiple producers write to the same track. Our grep of the codebase shows that clone() on BoolProducer isn’t called anywhere in js/hang/src except in its own definition, so removing or altering it won’t break internal usage—but it is part of the public API and could impact consumers outside this repo.

Please choose one of the following approaches and verify that downstream code will remain correct:

Remove clone() entirely
– Delete the method from js/hang/src/container/bool.ts.
– Bump the major version and call out the breaking change in the changelog.

Refactor to share backing state
– Replace the private #group?: GroupProducer field with a shared #state: { group?: GroupProducer } object.
– Update the constructor and all usages of #group (in write(), close()) to use #state.group.
– Change clone() to pass the same #state into the new instance:

```diff
 // js/hang/src/container/bool.ts
-#group?: GroupProducer;
+#state: { group?: GroupProducer };

-constructor(track: TrackProducer) {
-  this.track = track;
-}
+constructor(track: TrackProducer, state: { group?: GroupProducer } = {}) {
+  this.track = track;
+  this.#state = state;
+}

 write(value: boolean) {
   if (value) {
-    if (this.#group) return;
-    this.#group = this.track.appendGroup();
-    this.#group.writeFrame(...);
+    if (this.#state.group) return;
+    this.#state.group = this.track.appendGroup();
+    this.#state.group.writeFrame(...);
   } else {
-    if (!this.#group) return;
-    this.#group.close();
-    this.#group = undefined;
+    if (!this.#state.group) return;
+    this.#state.group.close();
+    this.#state.group = undefined;
   }
 }

-clone() {
-  return new BoolProducer(this.track);
-}
+clone() {
+  return new BoolProducer(this.track, this.#state);
+}

 close() {
   this.track.close();
-  this.#group?.close();
-  this.#group = undefined;
+  this.#state.group?.close();
+  this.#state.group = undefined;
 }
```

Deprecate first, then remove
– Mark clone() as deprecated in JSDoc.
– In the next major release, drop the method entirely.

jm: Please confirm which path you’d like to take and ensure any external consumers of BoolProducer.clone() are updated accordingly.

js/hang/src/publish/audio/speaking.ts (4)

50-55: Direct wiring of active → BoolProducer is clean and efficient.

The nested effect avoids track recreation and only emits on state edges. Good use of the container abstraction.


74-107: Audio graph setup looks correct; keep the bundler contract for CaptureWorklet stable.

The addModule(CaptureWorklet) pattern relies on the bundler returning a module URL via ?worker&url. Ensure this remains aligned with your build. No changes requested.


56-69: Hysteresis Already Present in VAD Worker
The speaking-worker.ts logic uses distinct on/off thresholds (≥0.1 to start speaking, <0.3 to stop) and waits for 8 consecutive silent chunks (~8 × 31 ms ≈ 248 ms) before emitting speaking: false, matching the suggested hold time. No additional debounce/hysteresis is needed.


25-31: BoolProducer reuse confirmed; no changes needed

I’ve verified that #bool is instantiated once as a class-level field and is never re-created in the #run method. Each enable toggle simply calls:

  • this.audio.broadcast.insertTrack(this.#bool.track.consume())
  • And on cleanup, this.audio.broadcast.removeTrack(this.#bool.track.name)

This ensures you reuse the same producer and track across enable/disable cycles without unnecessary churn.

Comment on lines 43 to 66
async next(): Promise<boolean | undefined> {
for (;;) {
if (!this.#group) {
const group = await this.track.nextGroup();
if (!group) return undefined;

this.#group = group;
return true;
}

const group = await Promise.race([
this.track.nextGroup(),
this.#group.closed(),
])
if (group) {
this.#group = group;
continue;
}

this.#group.close();
this.#group = undefined;
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Disambiguate Promise.race result to avoid treating end-of-track as “false”.

Promise.race between nextGroup() and closed() returns undefined for both “track ended” and “group closed” paths. Today both map to returning false. It’s better to handle the cases explicitly, so end-of-track behavior is intentional and robust to internal changes.

Apply:

-			const group = await Promise.race([
-				this.track.nextGroup(),
-				this.#group.closed(),
-			])
-			if (group) {
-				this.#group = group;
-				continue;
-			}
-
-			this.#group.close();
-			this.#group = undefined;
-			return false;
+			const result:
+				| { kind: "group"; group?: GroupConsumer }
+				| { kind: "closed" } =
+				await Promise.race([
+					this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })),
+					this.#group.closed().then(() => ({ kind: "closed" as const })),
+				]);
+
+			if (result.kind === "group") {
+				if (result.group) {
+					this.#group = result.group;
+					continue;
+				}
+				// Track ended while current group still open. Emit a final false then let caller
+				// observe undefined on the next next() call.
+				this.#group.close();
+				this.#group = undefined;
+				return false;
+			}
+
+			// Current group closed → false edge.
+			this.#group.close();
+			this.#group = undefined;
+			return false;

Optionally, guard against abort errors by converting closed() rejections into “closed” signals with .catch(() => ({ kind: "closed" as const })).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async next(): Promise<boolean | undefined> {
for (;;) {
if (!this.#group) {
const group = await this.track.nextGroup();
if (!group) return undefined;
this.#group = group;
return true;
}
const group = await Promise.race([
this.track.nextGroup(),
this.#group.closed(),
])
if (group) {
this.#group = group;
continue;
}
this.#group.close();
this.#group = undefined;
return false;
}
}
async next(): Promise<boolean | undefined> {
for (;;) {
if (!this.#group) {
const group = await this.track.nextGroup();
if (!group) return undefined;
this.#group = group;
return true;
}
const result:
| { kind: "group"; group?: GroupConsumer }
| { kind: "closed" } =
await Promise.race([
this.track.nextGroup().then((g) => ({ kind: "group" as const, group: g })),
this.#group.closed().then(() => ({ kind: "closed" as const })),
]);
if (result.kind === "group") {
if (result.group) {
this.#group = result.group;
continue;
}
// Track ended while current group still open. Emit a final false then let caller
// observe undefined on the next next() call.
this.#group.close();
this.#group = undefined;
return false;
}
// Current group closed → false edge.
this.#group.close();
this.#group = undefined;
return false;
}
}
🤖 Prompt for AI Agents
In js/hang/src/container/bool.ts around lines 43-66, the Promise.race result is
ambiguous because both "track ended" (nextGroup() -> undefined) and "group
closed" (closed() -> undefined) produce the same falsy value; change the race to
return tagged results so you can distinguish sources (e.g. wrap nextGroup() to
resolve to { kind: "next", group: value } and closed() to resolve to { kind:
"closed" } and convert closed() rejections to the closed tag via .catch), then
branch on the tag: if kind === "next" and group is undefined return undefined
(end of track), if kind === "next" and group is a Group set this.#group and
continue, if kind === "closed" call this.#group.close(), clear this.#group and
return false. Ensure closed() rejections are caught and normalized to the closed
tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants