-
-
Notifications
You must be signed in to change notification settings - Fork 20
speedup ByteString and well-formed USVString #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| throw makeException(TypeError, "is not a valid ByteString", options); | ||
| } | ||
|
|
||
| // eslint-disable-next-line require-unicode-regexp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't disable lints; fix the code to pass them instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? There should be no need for /u for this specific usecase, and it's 5x slower for wide strings.
I understand why /u is preferred in general as a rule, but not for testing ascii or single-byte
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex:
ByteString, len = 100 x 38,977,623 ops/sec @ 25ns/op (0ns..1206μs)
ByteString, len = 10,000 x 39,783,208 ops/sec @ 25ns/op (0ns..55μs)
ByteString, len = 1,000,000 x 39,833,400 ops/sec @ 25ns/op (0ns..64μs)
ByteString, len = 100,000,000 x 40,453,673 ops/sec @ 24ns/op (0ns..351μs)
ByteString, len = 500,000,000 x 40,991,163 ops/sec @ 24ns/op (0ns..4ms)
ByteString, wide, len = 100 x 11,596,562 ops/sec @ 86ns/op (0ns..970μs)
ByteString, wide, len = 10,000 x 198,745 ops/sec @ 5μs/op (4μs..95μs)
ByteString, wide, len = 1,000,000 x 2,007 ops/sec @ 498μs/op (493μs..668μs)
ByteString, wide, len = 100,000,000 x 20 ops/sec @ 50ms/op (49ms..53ms)
ByteString, wide, len = 500,000,000 x 4.0 ops/sec @ 248ms/op (247ms..250ms)
Unicode mode regex:
ByteString, len = 100 x 41,036,785 ops/sec @ 24ns/op (0ns..1010μs)
ByteString, len = 10,000 x 39,797,708 ops/sec @ 25ns/op (0ns..113μs)
ByteString, len = 1,000,000 x 39,954,124 ops/sec @ 25ns/op (0ns..46μs)
ByteString, len = 100,000,000 x 41,130,007 ops/sec @ 24ns/op (0ns..513μs)
ByteString, len = 500,000,000 x 41,435,116 ops/sec @ 24ns/op (0ns..603μs)
ByteString, wide, len = 100 x 2,812,106 ops/sec @ 355ns/op (250ns..57μs)
ByteString, wide, len = 10,000 x 31,405 ops/sec @ 31μs/op (29μs..113μs)
ByteString, wide, len = 1,000,000 x 314 ops/sec @ 3ms/op (2ms..3ms)
ByteString, wide, len = 100,000,000 x 3.1 ops/sec @ 317ms/op (317ms..318ms)
ByteString, wide, len = 500,000,000 x 0.63 ops/sec @ 1599ms/op
That is a huge perf impact without any behavior differences here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that programmers have to care about the performance difference here and cannot just enforce lint rules generally. I guess a comment is the best we can do, maybe with a link to a V8 bug.
| throw makeException(TypeError, "is not a valid ByteString", options); | ||
| }; | ||
|
|
||
| exports.USVString = (value, options = {}) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just replace this whole function with toWellFormed()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! (When present)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just replace it wholesale, no need for existence testing. Node.js v20 is the minimum version requirement.
For ByteString, this makes it instant (as in: up to 1e8 times faster)
v8 stores strings in single-byte width when it can, and then this regex is short-circuited
For
USVString, this is a ~50x improvement.ByteStringin benchmarks below is'Hello, \xff\x00\x80'.repeat(rep)ByteString, widein benchmarks below is(str + '\u1000').slice(0, -1)Before:
After:
Note that ~40,000,000 ops/sec is about the limit of what the benchmark can measure
Further improvement for
USVStringis possible for platforms without nativeisWellFormed(e.g. Hermes) and for non-well-formed strings, but that is somewhat more complex and would likely introduce a dep (as it needs to detect the best impl based on what the platfrom provides), so I decided to avoid that hereUsing arrays for strings is also significantly suboptimal as it breaks on large strings