Skip to content

bpf: Add kfuncs for read-only string operations #8709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kernel-patches-daemon-bpf[bot]
Copy link

Pull request for series with
subject: bpf: Add kfuncs for read-only string operations
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

String operations are commonly used so this exposes the most common ones
to BPF programs. For now, we limit ourselves to operations which do not
copy memory around.

Unfortunately, most in-kernel implementations assume that strings are
%NUL-terminated, which is not necessarily true, and therefore we cannot
use them directly in BPF context. So, we use distinct approaches for
bounded and unbounded variants of string operations:

- Unbounded variants are open-coded with using __get_kernel_nofault
  instead of plain dereference to make them safe.

- Bounded variants use params with the __sz suffix so safety is assured
  by the verifier and we can use the in-kernel (potentially optimized)
  functions.

Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Viktor Malik <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: 9aa8fe2
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=946777
version: 3

The tests use the RUN_TESTS helper which executes BPF programs with
BPF_PROG_TEST_RUN and check for the expected return value.

Suggested-by: Eduard Zingerman <[email protected]>
Signed-off-by: Viktor Malik <[email protected]>
Add a new benchmark using the existing bench infrastructure which
compares performance of bounded and unbounded string kfuncs added in the
previous commits.

Running on x86_64 and arm64, the most significant difference is in the
strlen/strnlen and strstr/strnstr comparisons on arm64:

    strlen/strnlen
    ==============
    strlen-1             0.453 ± 0.002M/s (drops 0.000 ± 0.000M/s)
    strnlen-1            0.470 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strlen-8             0.459 ± 0.011M/s (drops 0.000 ± 0.000M/s)
    strnlen-8            0.451 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strlen-64            0.439 ± 0.007M/s (drops 0.000 ± 0.000M/s)
    strnlen-64           0.455 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strlen-512           0.359 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strnlen-512          0.441 ± 0.007M/s (drops 0.000 ± 0.000M/s)
    strlen-2048          0.232 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strnlen-2048         0.403 ± 0.005M/s (drops 0.000 ± 0.000M/s)
    strlen-4095          0.151 ± 0.001M/s (drops 0.000 ± 0.000M/s)
    strnlen-4095         0.362 ± 0.005M/s (drops 0.000 ± 0.000M/s)

    strstr/strnstr
    ==============
    strstr-8             0.452 ± 0.005M/s (drops 0.000 ± 0.000M/s)
    strnstr-8            0.442 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strstr-64            0.390 ± 0.004M/s (drops 0.000 ± 0.000M/s)
    strnstr-64           0.400 ± 0.004M/s (drops 0.000 ± 0.000M/s)
    strstr-512           0.228 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strnstr-512          0.256 ± 0.002M/s (drops 0.000 ± 0.000M/s)
    strstr-2048          0.095 ± 0.001M/s (drops 0.000 ± 0.000M/s)
    strnstr-2048         0.113 ± 0.001M/s (drops 0.000 ± 0.000M/s)
    strstr-4095          0.052 ± 0.001M/s (drops 0.000 ± 0.000M/s)
    strnstr-4095         0.064 ± 0.001M/s (drops 0.000 ± 0.000M/s)

For strings longer than 64B, the unbounded variants are notably faster,
having as much as 140% performance gain over the bounded variants
(strncmp for strings of length 4095). The reason is that arm64 has an
optimized implementation of strnlen in assembly which is also used
inside strnstr.

On x86_64, which doesn't have any optimized string operations, there is
still an observable difference in strlen/strnlen and strstr/strnstr,
albeit much smaller than for arm64:

    strlen/strnlen
    ==============
    strlen-1             7.021 ± 0.036M/s (drops 0.000 ± 0.000M/s)
    strnlen-1            7.000 ± 0.038M/s (drops 0.000 ± 0.000M/s)
    strlen-8             6.837 ± 0.011M/s (drops 0.000 ± 0.000M/s)
    strnlen-8            6.832 ± 0.064M/s (drops 0.000 ± 0.000M/s)
    strlen-64            5.638 ± 0.026M/s (drops 0.000 ± 0.000M/s)
    strnlen-64           6.010 ± 0.034M/s (drops 0.000 ± 0.000M/s)
    strlen-512           3.322 ± 0.011M/s (drops 0.000 ± 0.000M/s)
    strnlen-512          3.449 ± 0.014M/s (drops 0.000 ± 0.000M/s)
    strlen-2048          1.390 ± 0.007M/s (drops 0.000 ± 0.000M/s)
    strnlen-2048         1.429 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strlen-4095          0.786 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strnlen-4095         0.803 ± 0.002M/s (drops 0.000 ± 0.000M/s)

    strstr/strnstr
    ==============
    strstr-8             6.031 ± 0.012M/s (drops 0.000 ± 0.000M/s)
    strnstr-8            6.322 ± 0.048M/s (drops 0.000 ± 0.000M/s)
    strstr-64            3.221 ± 0.054M/s (drops 0.000 ± 0.000M/s)
    strnstr-64           3.059 ± 0.025M/s (drops 0.000 ± 0.000M/s)
    strstr-512           0.734 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strnstr-512          0.849 ± 0.004M/s (drops 0.000 ± 0.000M/s)
    strstr-2048          0.220 ± 0.004M/s (drops 0.000 ± 0.000M/s)
    strnstr-2048         0.246 ± 0.002M/s (drops 0.000 ± 0.000M/s)
    strstr-4095          0.104 ± 0.000M/s (drops 0.000 ± 0.000M/s)
    strnstr-4095         0.122 ± 0.000M/s (drops 0.000 ± 0.000M/s)

The performance gain of the bounded variants on strings over 64B is
3%-6% for strlen/strnlen and 12%-18% for strstr/strnstr. The likely
explanation is that the unbounded variants use __get_kernel_nofault
instead of plain derefence which introduces some small overhead. This
manifests mainly in the above functions as they iterate multiple
strings (i.e. use __get_kernel_nofault more).

For the rest of the functions in the benchmark (strchr/strnchr and
strchrnul/strnchrnul), the performance difference is negligable or
within the bounds of a statistical error, with an exception of
strchr/strnchr on arm64:

    strchr/strnchr
    ==============
    strchr-1             0.475 ± 0.010M/s (drops 0.000 ± 0.000M/s)
    strnchr-1            0.469 ± 0.008M/s (drops 0.000 ± 0.000M/s)
    strchr-8             0.448 ± 0.011M/s (drops 0.000 ± 0.000M/s)
    strnchr-8            0.472 ± 0.006M/s (drops 0.000 ± 0.000M/s)
    strchr-64            0.432 ± 0.010M/s (drops 0.000 ± 0.000M/s)
    strnchr-64           0.445 ± 0.008M/s (drops 0.000 ± 0.000M/s)
    strchr-512           0.308 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strnchr-512          0.330 ± 0.005M/s (drops 0.000 ± 0.000M/s)
    strchr-2048          0.156 ± 0.002M/s (drops 0.000 ± 0.000M/s)
    strnchr-2048         0.186 ± 0.003M/s (drops 0.000 ± 0.000M/s)
    strchr-4095          0.094 ± 0.001M/s (drops 0.000 ± 0.000M/s)
    strnchr-4095         0.115 ± 0.004M/s (drops 0.000 ± 0.000M/s)

Here, I'm not sure what the reason for the performance benefit is,
possibly a combination of compiler optimizations and
__get_kernel_nofault overhead.

Signed-off-by: Viktor Malik <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=946777 expired. Closing PR.

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot deleted the series/892991=>bpf-next branch March 31, 2025 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant