Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add full_get_token_bytes #171

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

add full_get_token_bytes #171

wants to merge 2 commits into from

Conversation

sribich
Copy link

@sribich sribich commented Aug 25, 2024

full_get_segment_bytes exists, but not full_get_token_bytes.

For non English languages, whisper can split tokens on non-valid UTF8 boundaries making finer grained parsing impossible with the current text methods.

Copy link
Owner

@tazz4843 tazz4843 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs cargo fmt as well

if ret.is_null() {
return Err(WhisperError::NullPointer);
}
let c_str = unsafe { CStr::from_ptr(ret) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that both of these functions have the same fundamental calling code, can we merge the common code between _bytes and _text and call the common (private) function from both?

Copy link
Author

@sribich sribich Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way to copy how it was done for the existing full_get_segment_text/text_lossy/bytes functions. If the change is made here, it should probably also change for those functions to keep things consistent.

Thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow I thought I responded to this forever ago, sorry about that. Yeah I think repeating this pattern on the others would be best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants