Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize find_first_not_of/find_last_not_of member functions (single character) #5102

Merged
merged 12 commits into from
Mar 24, 2025

Conversation

AlexGuteniev
Copy link
Contributor

Another basic_string vectorization.

🎲 Decisions

Some nearly-arbitrary decisions, can change if needed.

  • Return position in "last" but pointer in "first". For "first", pointer has to be adjusted in header due to unknown original beginning and not passing it due to _Start_at applying. For "last", can adjust in the separately compiled code, and can potentially use it for future micro-optimization.
  • Specialize for 32 and 64 bit character too. Not very useful, but general algorithm is written already.

⏱️ Benchmark results

Benchmark main this
bm<char, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/8021/3056 738 ns 49.8 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/63/62 17.9 ns 4.78 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/31/30 9.56 ns 5.71 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/15/14 4.76 ns 5.34 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/7/6 2.77 ns 3.23 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 740 ns 49.8 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 25.7 ns 4.89 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 8.88 ns 6.07 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 4.72 ns 5.31 ns
bm<char, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 2.51 ns 3.68 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/8021/3056 749 ns 82.8 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/63/62 17.5 ns 4.14 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/31/30 9.58 ns 3.64 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/15/14 7.36 ns 4.65 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/7/6 2.91 ns 4.67 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 739 ns 85.1 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 21.5 ns 4.69 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 8.55 ns 4.11 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 4.48 ns 4.38 ns
bm<wchar_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 2.40 ns 3.63 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/8021/3056 737 ns 157 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/63/62 17.4 ns 5.51 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/31/30 9.39 ns 4.03 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/15/14 6.90 ns 3.39 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotFirstOne>/7/6 2.95 ns 3.17 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/8021/3056 737 ns 159 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/63/62 17.5 ns 6.31 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/31/30 8.83 ns 4.62 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/15/14 4.52 ns 4.08 ns
bm<char32_t, not_highly_aligned_allocator, Op::StringFindNotLastOne>/7/6 2.63 ns 3.41 ns

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner November 19, 2024 21:04
@StephanTLavavej StephanTLavavej added the performance Must go faster label Nov 19, 2024
@StephanTLavavej StephanTLavavej self-assigned this Nov 19, 2024
@StephanTLavavej

This comment was marked as resolved.

This comment was marked as resolved.

@StephanTLavavej
Copy link
Member

Thanks! 😻 I pushed a conflict-free merge with main and very minor nitpicks.

5950X speedups look good:

Benchmark Before After Speedup
bm<char...NotFirstOne>/8021/3056 646 ns 50.4 ns 12.82
bm<char...NotFirstOne>/63/62 16.9 ns 4.86 ns 3.48
bm<char...NotFirstOne>/31/30 9.39 ns 8.89 ns 1.06
bm<char...NotFirstOne>/15/14 5.23 ns 8.15 ns 0.64
bm<char...NotFirstOne>/7/6 3.25 ns 5.78 ns 0.56
bm<char...NotLastOne>/8021/3056 645 ns 55.2 ns 11.68
bm<char...NotLastOne>/63/62 15.2 ns 5.74 ns 2.65
bm<char...NotLastOne>/31/30 8.74 ns 9.83 ns 0.89
bm<char...NotLastOne>/15/14 4.72 ns 9.19 ns 0.51
bm<char...NotLastOne>/7/6 2.97 ns 6.34 ns 0.47
bm<wchar_t...NotFirstOne>/8021/3056 647 ns 90.8 ns 7.13
bm<wchar_t...NotFirstOne>/63/62 16.2 ns 5.31 ns 3.05
bm<wchar_t...NotFirstOne>/31/30 9.49 ns 4.87 ns 1.95
bm<wchar_t...NotFirstOne>/15/14 5.22 ns 6.31 ns 0.83
bm<wchar_t...NotFirstOne>/7/6 3.38 ns 5.98 ns 0.57
bm<wchar_t...NotLastOne>/8021/3056 645 ns 95.5 ns 6.75
bm<wchar_t...NotLastOne>/63/62 15.1 ns 5.55 ns 2.72
bm<wchar_t...NotLastOne>/31/30 8.37 ns 5.14 ns 1.63
bm<wchar_t...NotLastOne>/15/14 4.72 ns 6.57 ns 0.72
bm<wchar_t...NotLastOne>/7/6 2.75 ns 6.10 ns 0.45
bm<char32_t...NotFirstOne>/8021/3056 647 ns 177 ns 3.66
bm<char32_t...NotFirstOne>/63/62 16.6 ns 6.55 ns 2.53
bm<char32_t...NotFirstOne>/31/30 9.85 ns 4.86 ns 2.03
bm<char32_t...NotFirstOne>/15/14 5.84 ns 4.05 ns 1.44
bm<char32_t...NotFirstOne>/7/6 3.60 ns 4.67 ns 0.77
bm<char32_t...NotLastOne>/8021/3056 645 ns 171 ns 3.77
bm<char32_t...NotLastOne>/63/62 15.0 ns 6.19 ns 2.42
bm<char32_t...NotLastOne>/31/30 8.35 ns 5.13 ns 1.63
bm<char32_t...NotLastOne>/15/14 4.46 ns 4.70 ns 0.95
bm<char32_t...NotLastOne>/7/6 2.76 ns 5.32 ns 0.52

@StephanTLavavej StephanTLavavej removed their assignment Mar 6, 2025
@StephanTLavavej StephanTLavavej self-assigned this Mar 21, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request Mar 21, 2025
@StephanTLavavej StephanTLavavej merged commit fb59f1d into microsoft:main Mar 24, 2025
39 checks passed
@StephanTLavavej
Copy link
Member

Thanks for optimizing these member functions of the STL's (second?) most widely used data structure! 🚀 🎉 😸

@AlexGuteniev AlexGuteniev deleted the not-one branch March 24, 2025 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants