Skip to content

feat(encoding): skip whitespace in base64 input #6771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lionel-rowe
Copy link
Contributor

Closes #6768

@lionel-rowe lionel-rowe requested a review from kt3k as a code owner July 23, 2025 11:10
Copy link

codecov bot commented Jul 23, 2025

Codecov Report

❌ Patch coverage is 93.93939% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.13%. Comparing base (5aa2bf1) to head (8475cb0).

Files with missing lines Patch % Lines
encoding/_common64.ts 96.61% 1 Missing and 1 partial ⚠️
encoding/unstable_base64_stream.ts 71.42% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6771      +/-   ##
==========================================
- Coverage   94.14%   94.13%   -0.01%     
==========================================
  Files         588      588              
  Lines       42496    42537      +41     
  Branches     6698     6709      +11     
==========================================
+ Hits        40006    40044      +38     
- Misses       2440     2442       +2     
- Partials       50       51       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +132 to +139
export function decodeChunk(
buffer: Uint8Array_,
i: number,
o: number,
alphabet: Uint8Array,
padding: number,
retryWs: boolean,
) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should have an option object instead of so many args for more clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree. Every value is needed and the function isn't exposed to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it makes it hard to understand what they are for if one is not already familiar with the code.
Alternatively adding tsdoc might also be a solution.

Comment on lines 123 to 128
try {
return _decode(buffer, i, o, alphabet, padding, true);
} catch (e) {
if (!(e instanceof RetriableError)) throw e;
return _decode(removeWhiteSpace(buffer), i, o, alphabet, padding, false);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of throwing an error and catching it, I think it might be better to give the user the option in Base64Options to strip whitespace beforehand.

if (options.ignoreWhiteSpace) {
  buffer = strip(buffer);
}
decode(buffer);

This way it's opt in and there isn't a penalty for if the whitespace is at the end.

Copy link
Contributor Author

@lionel-rowe lionel-rowe Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of throwing an error and catching it, I think it might be better to give the user the option in Base64Options to strip whitespace beforehand.

if (options.ignoreWhiteSpace) {
  buffer = strip(buffer);
}
decode(buffer);

This way it's opt in and there isn't a penalty for if the whitespace is at the end.

My reasoning for the current design is as follows:

  • Zero, or essentially zero, performance penalty where no white space encountered (at least in the sync versions, for the stream version it empirically seems to be negligible as it's not a bottleneck)
  • Minimal performance penalty for common cases of where whitespace would be encountered (e.g. after 76 chars). Actually I guess whitespace-at-end could be special-cased at negligible perf cost by just checking the last byte (edit: done)
  • Consistency with Uint8Array.fromBase64 with default options

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the option is better because I think the majority of people will know if they have whitespace in their encoding or not, and for the very few that don't know, they can take the performance penalty for assurance.

Copy link
Contributor Author

@lionel-rowe lionel-rowe Jul 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than simplicity of implementation, what's the advantage of having it as an option if there's no measurable performance improvement in any plausible case? The tradeoff is complicating the API, as well as inconsistency with Uint8Array.fromBase64. Once Uint8Array.fromBase64 API reaches wide availability, having it as an option would likely incur an additional performance penalty not to handle whitespace:

function someBase64DecodeFunction(
  base64Data: unknown /* string/u8 arr/stream/whatever */,
  options: { allowWhiteSpace: boolean },
) {
  if (!options.allowWhiteSpace) {
    // check for and throw on presence of whitespace chars, incurring a perf cost in the process
  }
  // some code using `Uint8Array.fromBase64`, which has no option to disallow whitespace
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Uint8Array.fromBase64 reaches runtimes other than Firefox. I'd imagine we'd just depreciate this API in favour of using that. Not replace the underlying logic here with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When Uint8Array.fromBase64 reaches runtimes other than Firefox. I'd imagine we'd just depreciate this API in favour of using that. Not replace the underlying logic here with that.

There's no stream version of Uint8Array.fromBase64 though. I was intentionally vague about what someBase64DecodeFunction is, cuz yeah it would likely only be the stream version that would be worth keeping.

unstable-base64 allows Uint8Array as input though, so conceivably that'd be worth keeping? (But probably not as it's trivial to just decode as UTF-8 first.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

decodeBase64 should remove line breaks before attempting to decode
3 participants