Skip to content

Conversation

@ChALkeR
Copy link

@ChALkeR ChALkeR commented Dec 22, 2025

For ByteString, this makes it instant (as in: up to 1e8 times faster)
v8 stores strings in single-byte width when it can, and then this regex is short-circuited

For USVString, this is a ~50x improvement.

ByteString in benchmarks below is 'Hello, \xff\x00\x80'.repeat(rep)
ByteString, wide in benchmarks below is (str + '\u1000').slice(0, -1)

Before:

ByteString, len = 100 x 2,023,757 ops/sec @ 494ns/op (375ns..3ms)
ByteString, len = 10,000 x 22,227 ops/sec @ 44μs/op (39μs..365μs)
ByteString, len = 1,000,000 x 205 ops/sec @ 4ms/op (4ms..51ms)
ByteString, len = 100,000,000 x 2.3 ops/sec @ 437ms/op (429ms..449ms)
ByteString, len = 500,000,000 x 0.46 ops/sec @ 2165ms/op

ByteString, wide, len = 100 x 2,609,199 ops/sec @ 383ns/op (291ns..87μs)
ByteString, wide, len = 10,000 x 28,362 ops/sec @ 35μs/op (34μs..191μs)
ByteString, wide, len = 1,000,000 x 288 ops/sec @ 3ms/op (3ms..4ms)
ByteString, wide, len = 100,000,000 x 2.8 ops/sec @ 360ms/op (355ms..363ms)
ByteString, wide, len = 500,000,000 x 0.57 ops/sec @ 1757ms/op

USVString, len = 100 x 735,938 ops/sec @ 1358ns/op (1166ns..538μs)
USVString, len = 10,000 x 7,969 ops/sec @ 125μs/op (116μs..588μs)
USVString, len = 100,000 x 533 ops/sec @ 1876μs/op (1613μs..4ms)
USVString, len = 1,000,000 x 34 ops/sec @ 29ms/op (26ms..36ms)

After:

ByteString, len = 100 x 43,621,666 ops/sec @ 22ns/op (0ns..108μs)
ByteString, len = 10,000 x 43,444,193 ops/sec @ 23ns/op (0ns..43μs)
ByteString, len = 1,000,000 x 42,080,222 ops/sec @ 23ns/op (0ns..212μs)
ByteString, len = 100,000,000 x 41,820,993 ops/sec @ 23ns/op (0ns..431μs)
ByteString, len = 500,000,000 x 41,792,439 ops/sec @ 23ns/op (0ns..568μs)

ByteString, wide, len = 100 x 11,521,579 ops/sec @ 86ns/op (0ns..74μs)
ByteString, wide, len = 10,000 x 187,266 ops/sec @ 5μs/op (4μs..328μs)
ByteString, wide, len = 1,000,000 x 1,896 ops/sec @ 527μs/op (493μs..1112μs)
ByteString, wide, len = 100,000,000 x 19 ops/sec @ 52ms/op (51ms..54ms)
ByteString, wide, len = 500,000,000 x 3.8 ops/sec @ 264ms/op (263ms..265ms)

USVString, len = 100 x 16,327,984 ops/sec @ 61ns/op (0ns..187μs)
USVString, len = 10,000 x 248,573 ops/sec @ 4μs/op (3μs..1565μs)
USVString, len = 100,000 x 25,106 ops/sec @ 39μs/op (37μs..236μs)
USVString, len = 1,000,000 x 2,510 ops/sec @ 398μs/op (369μs..823μs)

Note that ~40,000,000 ops/sec is about the limit of what the benchmark can measure

Further improvement for USVString is possible for platforms without native isWellFormed (e.g. Hermes) and for non-well-formed strings, but that is somewhat more complex and would likely introduce a dep (as it needs to detect the best impl based on what the platfrom provides), so I decided to avoid that here

Using arrays for strings is also significantly suboptimal as it breaks on large strings

throw makeException(TypeError, "is not a valid ByteString", options);
}

// eslint-disable-next-line require-unicode-regexp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't disable lints; fix the code to pass them instead.

Copy link
Author

@ChALkeR ChALkeR Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? There should be no need for /u for this specific usecase, and it's 5x slower for wide strings.
I understand why /u is preferred in general as a rule, but not for testing ascii or single-byte

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex:

ByteString, len = 100 x 38,977,623 ops/sec @ 25ns/op (0ns..1206μs)
ByteString, len = 10,000 x 39,783,208 ops/sec @ 25ns/op (0ns..55μs)
ByteString, len = 1,000,000 x 39,833,400 ops/sec @ 25ns/op (0ns..64μs)
ByteString, len = 100,000,000 x 40,453,673 ops/sec @ 24ns/op (0ns..351μs)
ByteString, len = 500,000,000 x 40,991,163 ops/sec @ 24ns/op (0ns..4ms)

ByteString, wide, len = 100 x 11,596,562 ops/sec @ 86ns/op (0ns..970μs)
ByteString, wide, len = 10,000 x 198,745 ops/sec @ 5μs/op (4μs..95μs)
ByteString, wide, len = 1,000,000 x 2,007 ops/sec @ 498μs/op (493μs..668μs)
ByteString, wide, len = 100,000,000 x 20 ops/sec @ 50ms/op (49ms..53ms)
ByteString, wide, len = 500,000,000 x 4.0 ops/sec @ 248ms/op (247ms..250ms)

Unicode mode regex:

ByteString, len = 100 x 41,036,785 ops/sec @ 24ns/op (0ns..1010μs)
ByteString, len = 10,000 x 39,797,708 ops/sec @ 25ns/op (0ns..113μs)
ByteString, len = 1,000,000 x 39,954,124 ops/sec @ 25ns/op (0ns..46μs)
ByteString, len = 100,000,000 x 41,130,007 ops/sec @ 24ns/op (0ns..513μs)
ByteString, len = 500,000,000 x 41,435,116 ops/sec @ 24ns/op (0ns..603μs)

ByteString, wide, len = 100 x 2,812,106 ops/sec @ 355ns/op (250ns..57μs)
ByteString, wide, len = 10,000 x 31,405 ops/sec @ 31μs/op (29μs..113μs)
ByteString, wide, len = 1,000,000 x 314 ops/sec @ 3ms/op (2ms..3ms)
ByteString, wide, len = 100,000,000 x 3.1 ops/sec @ 317ms/op (317ms..318ms)
ByteString, wide, len = 500,000,000 x 0.63 ops/sec @ 1599ms/op

That is a huge perf impact without any behavior differences here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that programmers have to care about the performance difference here and cannot just enforce lint rules generally. I guess a comment is the best we can do, maybe with a link to a V8 bug.

throw makeException(TypeError, "is not a valid ByteString", options);
};

exports.USVString = (value, options = {}) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just replace this whole function with toWellFormed()?

Copy link
Author

@ChALkeR ChALkeR Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! (When present)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just replace it wholesale, no need for existence testing. Node.js v20 is the minimum version requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants