Performance

Ideas for interesting benchmarks (or improving existing benchmarks) are welcome on the issue tracker. PRs to add numbers for other CPU/OS combinations are also welcome.

Workload 1 - parallel "tokenization" + interning for a "large" repo

The workload walks the benchmarks/data directory, finds .rs files, splits them up into whitespace, and interns the substrings.

This is a proxy for an actual compiler frontend which would do something similar, but with a proper lexer, virtual file system etc.

We measure the wall time, both with a cold page cache (clearing the cache requires sudo), and a warm page cache. We also try both AHash and FxHash. The latter is used by the Rust compiler and Firefox.

Running

To run the benchmark:

git clone --quiet --depth=1 https://github.com/rust-analyzer/rust-analyzer.git --branch=2021-09-20 benchmarks/data/rust-analyzer
rm -rf benchmarks/data/rust-analyzer/.git
# Run benchmarks with warm caches only (cold cache numbers aren't too different)
cargo bench --all-features --bench interner-speed | tee output.txt
# More complex configuration (requires sudo to clear the page cache)
(export BENCH_NTHREADS="$(seq 1 3 11)"; export BENCH_COLD_CACHE=1; cargo bench --all-features --bench interner-speed | tee output.txt)

Depending on the settings, it may take anywhere between a few minutes to 30+ minutes. You should not use your computer in the meantime; if you are clearing the page cache, you will likely not be able to use it anyways. 😅

We have a script which parses the Criterion output and prints a nice table.

./dev/criterion-to-table.py output.txt

Results

M1 MacBook Pro (2020): 4P + 4E cores

nthreads	cold/ahash	cold/fxhash	warm/ahash	warm/fxhash
n = 1	128.56 ms	129.95 ms	129.50 ms	127.29 ms
n = 2	76.074 ms	72.834 ms	74.506 ms	73.159 ms
n = 3	51.923 ms	50.856 ms	52.440 ms	51.355 ms
n = 4	42.607 ms	42.377 ms	46.230 ms	42.607 ms
n = 5	38.461 ms	38.640 ms	39.489 ms	42.483 ms
n = 6	38.592 ms	39.008 ms	39.095 ms	38.968 ms
n = 7	42.205 ms	45.715 ms	42.219 ms	41.501 ms
n = 8	50.306 ms	44.320 ms	46.749 ms	47.501 ms

The key point I'd like to highlight is that peak performance is not at n = 8. This is something I've consistently seen over multiple runs; the best performance ends up being between n = 4 to n = 6.

Since this was done on a MacBook Pro, without much control over thermals, you should not take the numbers for AHash vs FxHash comparison too seriously. For example, FxHash usually performs slightly better on very short strings; this workload does create many very short strings because it splits on whitespace, whereas a real lexer would likely not intern any strings for individual characters like { and }.

The point is, they're pretty close, and it is easy to swap out one for the other.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!