speedup ByteString and well-formed USVString #54

ChALkeR · 2025-12-22T10:36:51Z

For ByteString, this makes it instant (as in: up to 1e8 times faster)
v8 stores strings in single-byte width when it can, and then this regex is short-circuited

For USVString, this is a ~50x improvement.

ByteString in benchmarks below is 'Hello, \xff\x00\x80'.repeat(rep)
ByteString, wide in benchmarks below is (str + '\u1000').slice(0, -1)

Before:

ByteString, len = 100 x 2,023,757 ops/sec @ 494ns/op (375ns..3ms)
ByteString, len = 10,000 x 22,227 ops/sec @ 44μs/op (39μs..365μs)
ByteString, len = 1,000,000 x 205 ops/sec @ 4ms/op (4ms..51ms)
ByteString, len = 100,000,000 x 2.3 ops/sec @ 437ms/op (429ms..449ms)
ByteString, len = 500,000,000 x 0.46 ops/sec @ 2165ms/op

ByteString, wide, len = 100 x 2,609,199 ops/sec @ 383ns/op (291ns..87μs)
ByteString, wide, len = 10,000 x 28,362 ops/sec @ 35μs/op (34μs..191μs)
ByteString, wide, len = 1,000,000 x 288 ops/sec @ 3ms/op (3ms..4ms)
ByteString, wide, len = 100,000,000 x 2.8 ops/sec @ 360ms/op (355ms..363ms)
ByteString, wide, len = 500,000,000 x 0.57 ops/sec @ 1757ms/op

USVString, len = 100 x 735,938 ops/sec @ 1358ns/op (1166ns..538μs)
USVString, len = 10,000 x 7,969 ops/sec @ 125μs/op (116μs..588μs)
USVString, len = 100,000 x 533 ops/sec @ 1876μs/op (1613μs..4ms)
USVString, len = 1,000,000 x 34 ops/sec @ 29ms/op (26ms..36ms)

After:

ByteString, len = 100 x 43,621,666 ops/sec @ 22ns/op (0ns..108μs)
ByteString, len = 10,000 x 43,444,193 ops/sec @ 23ns/op (0ns..43μs)
ByteString, len = 1,000,000 x 42,080,222 ops/sec @ 23ns/op (0ns..212μs)
ByteString, len = 100,000,000 x 41,820,993 ops/sec @ 23ns/op (0ns..431μs)
ByteString, len = 500,000,000 x 41,792,439 ops/sec @ 23ns/op (0ns..568μs)

ByteString, wide, len = 100 x 11,521,579 ops/sec @ 86ns/op (0ns..74μs)
ByteString, wide, len = 10,000 x 187,266 ops/sec @ 5μs/op (4μs..328μs)
ByteString, wide, len = 1,000,000 x 1,896 ops/sec @ 527μs/op (493μs..1112μs)
ByteString, wide, len = 100,000,000 x 19 ops/sec @ 52ms/op (51ms..54ms)
ByteString, wide, len = 500,000,000 x 3.8 ops/sec @ 264ms/op (263ms..265ms)

USVString, len = 100 x 16,327,984 ops/sec @ 61ns/op (0ns..187μs)
USVString, len = 10,000 x 248,573 ops/sec @ 4μs/op (3μs..1565μs)
USVString, len = 100,000 x 25,106 ops/sec @ 39μs/op (37μs..236μs)
USVString, len = 1,000,000 x 2,510 ops/sec @ 398μs/op (369μs..823μs)

Note that ~40,000,000 ops/sec is about the limit of what the benchmark can measure

Further improvement for USVString is possible for platforms without native isWellFormed (e.g. Hermes) and for non-well-formed strings, but that is somewhat more complex and would likely introduce a dep (as it needs to detect the best impl based on what the platfrom provides), so I decided to avoid that here

Using arrays for strings is also significantly suboptimal as it breaks on large strings

domenic · 2025-12-23T04:41:42Z

lib/index.js

-      throw makeException(TypeError, "is not a valid ByteString", options);
-    }
+
+  // eslint-disable-next-line require-unicode-regexp


Please don't disable lints; fix the code to pass them instead.

Why? There should be no need for /u for this specific usecase, and it's 5x slower for wide strings.
I understand why /u is preferred in general as a rule, but not for testing ascii or single-byte

I added a comment

This regex:

ByteString, len = 100 x 38,977,623 ops/sec @ 25ns/op (0ns..1206μs) ByteString, len = 10,000 x 39,783,208 ops/sec @ 25ns/op (0ns..55μs) ByteString, len = 1,000,000 x 39,833,400 ops/sec @ 25ns/op (0ns..64μs) ByteString, len = 100,000,000 x 40,453,673 ops/sec @ 24ns/op (0ns..351μs) ByteString, len = 500,000,000 x 40,991,163 ops/sec @ 24ns/op (0ns..4ms) ByteString, wide, len = 100 x 11,596,562 ops/sec @ 86ns/op (0ns..970μs) ByteString, wide, len = 10,000 x 198,745 ops/sec @ 5μs/op (4μs..95μs) ByteString, wide, len = 1,000,000 x 2,007 ops/sec @ 498μs/op (493μs..668μs) ByteString, wide, len = 100,000,000 x 20 ops/sec @ 50ms/op (49ms..53ms) ByteString, wide, len = 500,000,000 x 4.0 ops/sec @ 248ms/op (247ms..250ms)

Unicode mode regex:

ByteString, len = 100 x 41,036,785 ops/sec @ 24ns/op (0ns..1010μs) ByteString, len = 10,000 x 39,797,708 ops/sec @ 25ns/op (0ns..113μs) ByteString, len = 1,000,000 x 39,954,124 ops/sec @ 25ns/op (0ns..46μs) ByteString, len = 100,000,000 x 41,130,007 ops/sec @ 24ns/op (0ns..513μs) ByteString, len = 500,000,000 x 41,435,116 ops/sec @ 24ns/op (0ns..603μs) ByteString, wide, len = 100 x 2,812,106 ops/sec @ 355ns/op (250ns..57μs) ByteString, wide, len = 10,000 x 31,405 ops/sec @ 31μs/op (29μs..113μs) ByteString, wide, len = 1,000,000 x 314 ops/sec @ 3ms/op (2ms..3ms) ByteString, wide, len = 100,000,000 x 3.1 ops/sec @ 317ms/op (317ms..318ms) ByteString, wide, len = 500,000,000 x 0.63 ops/sec @ 1599ms/op

That is a huge perf impact without any behavior differences here

It's unfortunate that programmers have to care about the performance difference here and cannot just enforce lint rules generally. I guess a comment is the best we can do, maybe with a link to a V8 bug.

domenic · 2025-12-23T04:42:03Z

lib/index.js

+  throw makeException(TypeError, "is not a valid ByteString", options);
 };

 exports.USVString = (value, options = {}) => {


Can't we just replace this whole function with toWellFormed()?

Done! (When present)

Let's just replace it wholesale, no need for existence testing. Node.js v20 is the minimum version requirement.

ChALkeR force-pushed the patch-1 branch from 2d1f232 to 901d51d Compare December 22, 2025 10:38

domenic reviewed Dec 23, 2025

View reviewed changes

speedup ByteString and well-formed USVString

b460ca1

ChALkeR force-pushed the patch-1 branch from 901d51d to b460ca1 Compare December 23, 2025 04:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

speedup ByteString and well-formed USVString #54

speedup ByteString and well-formed USVString #54

Uh oh!

ChALkeR commented Dec 22, 2025 •

edited

Loading

Uh oh!

domenic Dec 23, 2025

Uh oh!

ChALkeR Dec 23, 2025 •

edited

Loading

Uh oh!

ChALkeR Dec 23, 2025

Uh oh!

ChALkeR Dec 23, 2025

Uh oh!

domenic Dec 23, 2025

Uh oh!

domenic Dec 23, 2025

Uh oh!

ChALkeR Dec 23, 2025 •

edited

Loading

Uh oh!

domenic Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

speedup ByteString and well-formed USVString #54

Are you sure you want to change the base?

speedup ByteString and well-formed USVString #54

Uh oh!

Conversation

ChALkeR commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

domenic Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

ChALkeR Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChALkeR Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

ChALkeR Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

domenic Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

domenic Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

ChALkeR Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

domenic Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChALkeR commented Dec 22, 2025 •

edited

Loading

ChALkeR Dec 23, 2025 •

edited

Loading

ChALkeR Dec 23, 2025 •

edited

Loading