-
Notifications
You must be signed in to change notification settings - Fork 654
feat(encoding): skip whitespace in base64 input #6771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6771 +/- ##
==========================================
- Coverage 94.14% 94.13% -0.01%
==========================================
Files 588 588
Lines 42496 42537 +41
Branches 6698 6709 +11
==========================================
+ Hits 40006 40044 +38
- Misses 2440 2442 +2
- Partials 50 51 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
export function decodeChunk( | ||
buffer: Uint8Array_, | ||
i: number, | ||
o: number, | ||
alphabet: Uint8Array, | ||
padding: number, | ||
retryWs: boolean, | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function should have an option object instead of so many args for more clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree. Every value is needed and the function isn't exposed to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it makes it hard to understand what they are for if one is not already familiar with the code.
Alternatively adding tsdoc might also be a solution.
encoding/_common64.ts
Outdated
try { | ||
return _decode(buffer, i, o, alphabet, padding, true); | ||
} catch (e) { | ||
if (!(e instanceof RetriableError)) throw e; | ||
return _decode(removeWhiteSpace(buffer), i, o, alphabet, padding, false); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of throwing an error and catching it, I think it might be better to give the user the option in Base64Options to strip whitespace beforehand.
if (options.ignoreWhiteSpace) {
buffer = strip(buffer);
}
decode(buffer);
This way it's opt in and there isn't a penalty for if the whitespace is at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of throwing an error and catching it, I think it might be better to give the user the option in Base64Options to strip whitespace beforehand.
if (options.ignoreWhiteSpace) { buffer = strip(buffer); } decode(buffer);This way it's opt in and there isn't a penalty for if the whitespace is at the end.
My reasoning for the current design is as follows:
- Zero, or essentially zero, performance penalty where no white space encountered (at least in the sync versions, for the stream version it empirically seems to be negligible as it's not a bottleneck)
- Minimal performance penalty for common cases of where whitespace would be encountered (e.g. after 76 chars). Actually I guess whitespace-at-end could be special-cased at negligible perf cost by just checking the last byte (edit: done)
- Consistency with
Uint8Array.fromBase64
with default options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the option is better because I think the majority of people will know if they have whitespace in their encoding or not, and for the very few that don't know, they can take the performance penalty for assurance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than simplicity of implementation, what's the advantage of having it as an option if there's no measurable performance improvement in any plausible case? The tradeoff is complicating the API, as well as inconsistency with Uint8Array.fromBase64
. Once Uint8Array.fromBase64
API reaches wide availability, having it as an option would likely incur an additional performance penalty not to handle whitespace:
function someBase64DecodeFunction(
base64Data: unknown /* string/u8 arr/stream/whatever */,
options: { allowWhiteSpace: boolean },
) {
if (!options.allowWhiteSpace) {
// check for and throw on presence of whitespace chars, incurring a perf cost in the process
}
// some code using `Uint8Array.fromBase64`, which has no option to disallow whitespace
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Uint8Array.fromBase64
reaches runtimes other than Firefox. I'd imagine we'd just depreciate this API in favour of using that. Not replace the underlying logic here with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Uint8Array.fromBase64 reaches runtimes other than Firefox. I'd imagine we'd just depreciate this API in favour of using that. Not replace the underlying logic here with that.
There's no stream version of Uint8Array.fromBase64
though. I was intentionally vague about what someBase64DecodeFunction
is, cuz yeah it would likely only be the stream version that would be worth keeping.
unstable-base64
allows Uint8Array
as input though, so conceivably that'd be worth keeping? (But probably not as it's trivial to just decode as UTF-8 first.)
Closes #6768