-
Notifications
You must be signed in to change notification settings - Fork 46
Voice activity detection model #625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Constants.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp
Outdated
Show resolved
Hide resolved
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Outdated
Show resolved
Hide resolved
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Show resolved
Hide resolved
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Outdated
Show resolved
Hide resolved
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Show resolved
Hide resolved
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Outdated
Show resolved
Hide resolved
inline constexpr uint32_t nextPowerOfTwo(uint32_t n) noexcept { | ||
if (n <= 1) | ||
return 1; | ||
n--; | ||
n |= n >> 1; | ||
n |= n >> 2; | ||
n |= n >> 4; | ||
n |= n >> 8; | ||
n |= n >> 16; | ||
return n + 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an util
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end I got rid of this function; (thanks to the @msluszniak comment about std::bit_ceil ), but to clarify, I agree that it should be inside Utils.h
, and that was my initial approach, but due to circular dependencies I could not make it work. Obviously, one could solve this by introducing new file etc, but actually moving it to the Constants.h
due to its simplicity seemed like the best solution to me. Writing this comment to ask, are there any better solutions I am not aware of?
packages/react-native-executorch/common/rnexecutorch/models/voice_activity_detection/Utils.cpp
Outdated
Show resolved
Hide resolved
const std::array<float, kWindowSize> generateHammingWindow() noexcept { | ||
constexpr size_t size = static_cast<size_t>(constants::kWindowSize); | ||
std::array<float, size> window; | ||
for (size_t i = 0; i < size; ++i) { | ||
window[i] = | ||
0.54f - | ||
0.46f * std::cos((2.0f * std::numbers::pi_v<float> * i) / (size - 1)); | ||
} | ||
return window; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check if this will work with the hannWindow defined in dsp.cpp. If so, then you don't need to define it. Otherwise, move this function to dsp.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked it, it works fine. I deleted this function entirely, and I use hannWindow
instead.
The differences are very subtle (e.g. on the 60s audio clip with 16 speech segments, 14 of them are exactly the same, and two of them differ by circa 100ms, making it negligible.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils.cpp often ends up as a bit of a bag for random functions. Might be worth thinking about whether some of these belong in more focused files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My approach is simple: In the model (VoiceActivityDetection.h/.cpp
in this case ) files, I want only three functions, that is: preprocess, generate, postprocess. Basically everything else goes to Utils.cpp
.
Often utils files are not very long so I think this works ok. Introducing some other files might be
less helpful, and more confusing but it is only my opinion. Also sometimes it is just hard to find the "common denominator" of more than one function.
...ve-executorch/common/rnexecutorch/models/voice_activity_detection/VoiceActivityDetection.cpp
Outdated
Show resolved
Hide resolved
|
||
std::vector<std::array<float, kPaddedWindowSize>> | ||
VoiceActivityDetection::preprocess(std::span<float> waveform) const { | ||
auto kHammingWindowArray = utils::generateHammingWindow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's not actually depending on the size of the input, nor the input - calling this function on each preprocess call is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in comment above, in the end I deleted this function, but for clarification I will say this:
Yes, this is redundant, but there is no easy way to bypass it. The problem is, std::cos
is constexpr
since C++23
and we use C++20
. At first I tried to simply write it as constexpr and calculate it in compile time, but as it turns out it is impossible. Obviously, what could be done is one of these:
- we could write cosine function independently, and we could make it constexpr
- hardcode the output in some different way.
But I reckon it is just unnecessary complications (in the end it is 400 element array so it should be negligible)
Since I have decided to use the hannWindow
instead, it does not matter (but obviously, the hannWindow
has the same disadvantage of being calculated with each model run, and what is even worse - it is std::vector
and not std::array
.... )
packages/react-native-executorch/src/modules/natural_language_processing/VADModule.ts
Outdated
Show resolved
Hide resolved
I think it is ready to merge, but there is one thing that might be done before (or possibly after) merge: Obviously, review needs to be done, but I think we are in the clear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ code looks good but I haven't tested this feature
Description
This PR introduces voice activity detection (vad) feature into the react native executorch library.
This PR is not ready for merge yet,
however it is ready for the review of C++ code.
Things that are missing:
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Screenshots
Related issues
Closes #547
Checklist
Additional notes