Voice activity detection model #625

mlodyjesienin · 2025-09-25T15:49:59Z

Description

This PR introduces voice activity detection (vad) feature into the react native executorch library.

This PR is not ready for merge yet,
however it is ready for the review of C++ code.

Things that are missing:

Documentation
exported model on official swm huggingface
benchmarks
maybe example usage in one app? idk

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Screenshots

Related issues

Closes #547

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h

packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp

...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp

packages/react-native-executorch/src/constants/modelUrls.ts

...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp

chmjkb · 2025-09-29T08:12:18Z

...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h

+inline constexpr uint32_t nextPowerOfTwo(uint32_t n) noexcept {
+  if (n <= 1)
+    return 1;
+  n--;
+  n |= n >> 1;
+  n |= n >> 2;
+  n |= n >> 4;
+  n |= n >> 8;
+  n |= n >> 16;
+  return n + 1;
+}


I think this is an util

In the end I got rid of this function; (thanks to the @msluszniak comment about std::bit_ceil ), but to clarify, I agree that it should be inside Utils.h, and that was my initial approach, but due to circular dependencies I could not make it work. Obviously, one could solve this by introducing new file etc, but actually moving it to the Constants.h due to its simplicity seemed like the best solution to me. Writing this comment to ask, are there any better solutions I am not aware of?

packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp

chmjkb · 2025-09-29T08:26:06Z

packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp

+const std::array<float, kWindowSize> generateHammingWindow() noexcept {
+  constexpr size_t size = static_cast<size_t>(constants::kWindowSize);
+  std::array<float, size> window;
+  for (size_t i = 0; i < size; ++i) {
+    window[i] =
+        0.54f -
+        0.46f * std::cos((2.0f * std::numbers::pi_v<float> * i) / (size - 1));
+  }
+  return window;


You can check if this will work with the hannWindow defined in dsp.cpp. If so, then you don't need to define it. Otherwise, move this function to dsp.cpp

I have checked it, it works fine. I deleted this function entirely, and I use hannWindow instead.
The differences are very subtle (e.g. on the 60s audio clip with 16 speech segments, 14 of them are exactly the same, and two of them differ by circa 100ms, making it negligible.)

chmjkb · 2025-09-29T08:28:55Z

packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp

utils.cpp often ends up as a bit of a bag for random functions. Might be worth thinking about whether some of these belong in more focused files.

My approach is simple: In the model (VoiceActivityDetection.h/.cpp in this case ) files, I want only three functions, that is: preprocess, generate, postprocess. Basically everything else goes to Utils.cpp.
Often utils files are not very long so I think this works ok. Introducing some other files might be
less helpful, and more confusing but it is only my opinion. Also sometimes it is just hard to find the "common denominator" of more than one function.

...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp

chmjkb · 2025-09-29T08:35:17Z

...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp

+
+std::vector<std::array<float, kPaddedWindowSize>>
+VoiceActivityDetection::preprocess(std::span<float> waveform) const {
+  auto kHammingWindowArray = utils::generateHammingWindow();


Since it's not actually depending on the size of the input, nor the input - calling this function on each preprocess call is redundant.

As I mentioned in comment above, in the end I deleted this function, but for clarification I will say this:
Yes, this is redundant, but there is no easy way to bypass it. The problem is, std::cos is constexpr since C++23 and we use C++20. At first I tried to simply write it as constexpr and calculate it in compile time, but as it turns out it is impossible. Obviously, what could be done is one of these:

we could write cosine function independently, and we could make it constexpr

hardcode the output in some different way.

But I reckon it is just unnecessary complications (in the end it is 400 element array so it should be negligible)

Since I have decided to use the hannWindow instead, it does not matter (but obviously, the hannWindow has the same disadvantage of being calculated with each model run, and what is even worse - it is std::vector and not std::array.... )

packages/react-native-executorch/src/modules/natural_language_processing/VADModule.ts

mlodyjesienin · 2025-09-30T12:29:45Z

I think it is ready to merge, but there is one thing that might be done before (or possibly after) merge:
I measured inference time for benchmarks for android only on OnePlus because Samsung Galaxy S24 was unavailable.
Since it is my last day in work, I won't be able to do it, so someone else should complete the benchmarking.

Obviously, review needs to be done, but I think we are in the clear.

msluszniak

C++ code looks good but I haven't tested this feature

mlodyjesienin added 4 commits September 17, 2025 11:13

implement VoiceActivityDetection Class

53ae81e

add JSI bridge for vad in cpp

d57b084

fix CPP code + add JSI bridge to TS

160375b

merge main into @mlodyjesienin/voice-activity-detection

b8c1afa

mlodyjesienin requested review from msluszniak, jakmro and chmjkb September 25, 2025 15:50

update model url

21010d4

msluszniak reviewed Sep 27, 2025

View reviewed changes

chmjkb requested changes Sep 29, 2025

View reviewed changes

packages/react-native-executorch/src/modules/natural_language_processing/VADModule.ts Outdated Show resolved Hide resolved

mlodyjesienin added 5 commits September 30, 2025 10:20

fix memory management bug in c++ code

c32106d

add fsmn/FSMN to cspell

3471c5f

docs: add VAD documentation

b724c88

add requested changes

bfaa641

Merge branch 'main' into @mlodyjesienin/voice-activity-detection

3b57686

mlodyjesienin requested review from msluszniak and chmjkb September 30, 2025 12:29

msluszniak approved these changes Oct 1, 2025

View reviewed changes

msluszniak added the feature PRs that implement a new feature label Oct 6, 2025

msluszniak changed the title ~~feat:voice activity detection model~~ Voice activity detection model Oct 6, 2025

Voice activity detection model #625

Are you sure you want to change the base?

Voice activity detection model #625

Uh oh!

Conversation

mlodyjesienin commented Sep 25, 2025

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mlodyjesienin commented Sep 30, 2025

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!