Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance #132109

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cmaloney
Copy link
Contributor

@cmaloney cmaloney commented Apr 5, 2025

Speed up conversion from bytes-like objects like bytearray while keeping conversion from bytes stable.

On a --with-lto --enable-optimizaitons build on my 64 bit Linux box:

new:

from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns

old:

from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns

Benchmark code:

import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]

sample_bytearray = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0

runner = pyperf.Runner()

runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)

Speed up conversion from `bytes-like` objects like `bytearray` while
keeping conversion from `bytes` stable.

On a `--with-lto --enable-optimizaitons` build on my 64 bit Linux box:

new:
from_bytes_flags: Mean +- std dev: 28.6 ns +- 0.5 ns
bench_convert[bytes]: Mean +- std dev: 50.4 ns +- 1.4 ns
bench_convert[bytearray]: Mean +- std dev: 51.3 ns +- 0.7 ns

old:
from_bytes_flags: Mean +- std dev: 28.1 ns +- 1.1 ns
bench_convert[bytes]: Mean +- std dev: 50.3 ns +- 4.3 ns
bench_convert[bytearray]: Mean +- std dev: 64.7 ns +- 0.9 ns

Benchmark code:
```python
import pyperf
import time

def from_bytes_flags(loops):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        int.from_bytes(b'\x00\x10', byteorder='big')
        int.from_bytes(b'\x00\x10', byteorder='little')
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
        int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
        int.from_bytes([255, 0, 0], byteorder='big')
    return time.perf_counter() - t0

sample_bytes = [
    b'',
    b'\x00',
    b'\x01',
    b'\x7f',
    b'\x80',
    b'\xff',
    b'\x01\x00',
    b'\x7f\xff',
    b'\x80\x00',
    b'\xff\xff',
    b'\x01\x00\x00',
]

sample_bytearray = [bytearray(v) for v in sample_bytes]

def bench_convert(loops, values):
    range_it = range(loops)

    t0 = time.perf_counter()
    for _ in range_it:
        for val in values:
            int.from_bytes(val)
    return time.perf_counter() - t0

runner = pyperf.Runner()

runner.bench_time_func('from_bytes_flags', from_bytes_flags, inner_loops=10)
runner.bench_time_func('bench_convert[bytes]', bench_convert, sample_bytes, inner_loops=10)
runner.bench_time_func('bench_convert[bytearray]', bench_convert, sample_bytearray, inner_loops=10)
```
picnixz
picnixz previously approved these changes Apr 5, 2025
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have benchmarks for very large bytes? maybe you can also say how much we're gaining in the NEWS entry that way.

@picnixz picnixz changed the title gh-132108: Add Buffer Protocol support to int.from_bytes gh-132108: Add Buffer Protocol support to int.from_bytes to improve performance Apr 5, 2025
@picnixz
Copy link
Member

picnixz commented Apr 5, 2025

Small question but how do we cope with classes that explicitly define .__bytes__() and are buffer-like? like custom bytes objects? (this is an edge-case but still, it can be a breaking change).

Note that PyObject_Bytes first call __bytes__, then call PyBytes_FromObject if there is no __bytes__ and only then are buffer-like objects considered, but not before. So __bytes__ has a higher priority than buffer-like interface.

Instead, we should restrict ourselves to exact buffer objects, namely exact bytes and bytearray objects.

@picnixz picnixz dismissed their stale review April 5, 2025 10:04

I want to check that the edge cases are not an issue.

@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 5, 2025

Cases including classes which implement __bytes__() that return both valid (ex. bytes) and non-valid (ex. str) values are tested in test_long, test_from_bytes so I don't think any critical behavior changes there.

As you point out, if code returns a different set of machine bytes when exporting buffer protocol vs __bytes__(), this will change behavior. __bytes__() will not be run, instead just the buffer export will be called. That same issue will come up in PyObject_Bytes vs. PyBytes_FromObject calls as PyObject_Bytes checks __bytes__() first while PyBytes_FromObject does buffer protocol first and never checks __bytes__(). Code here uses PyObject_Bytes(). I don't think CPython strongly uses one or the other as "more correct".

Could match existing behavior by always checking for a __bytes__ member and !PyBytes_CheckExact() (avoid __bytes__() call for bytes as it changes performance and wasn't present before). To me that isn't as good of an implementation. It is slower (more branches), more complex code, and I prefer encouraging buffer protocol for best performance.

Could restrict to known CPython types (bytes, bytearray, array, memoryview), but that lowers the usefulness to me as systems which implement buffer and __bytes__ for efficiency can't use the newer and potentially more efficient buffer protocol here. It also requires more condition / type checks than PyObject_CheckBuffer.


Walking through common types passed to int.from_bytes() more explicitly:

  1. exact bytes, the new code will get the data using a Py_buffer rather than increment the ref to the bytes (PyBytes_CheckExact case). Perf test shows performance is stable for that.
  2. "bytes-like" objects (subclasses of bytes, bytearray, memoryview, array) used the buffer protocol to copy before, use now. Less calls/branches/checks getting to exporting the buffer. Removes a copy of that buffer into a PyBytes. Perf test shows faster for bytearray, likely is for other cases as well.
  3. list, tuple, iterable (other than str): PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  4. str: Doesn't export bytes. That fails / raises a TypeError In test_long, test_from_bytes validates that behavior. Behavior is unchanged.
  5. Objects that implement __bytes__() but don't support buffer protocol: Tested in test_long test_from_bytes (ValidBytes, InvalidBytes, RaisingBytes). These behave as before. PyObject_CheckBuffer will fail for so code will call PyObject_Bytes which will call PyBytes_FromObject to handle, same as before.
  6. Objects that implement __bytes__() and support buffer protocol: The __bytes__() function will no longer be called; If it broke the API contract by returning str for instance code will now run using its buffer protocol to get the underlying machine bytes instead of throwing an exception.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change. Example.

Before:

>>> class X(bytes):
...     def __bytes__(self):
...         return b'X'
...         
>>> int.from_bytes(X(b'a'))
88

After:

>>> class X(bytes):
...     def __bytes__(self):
...         return b'X'
...         
>>> int.from_bytes(X(b'a'))
97

Comment on lines 1 to 2
Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a
:class:`bytearray`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a
:class:`bytearray`.
Speed up :meth:`int.from_bytes` when passed a bytes-like object such as
:class:`bytes` and :class:`bytearray`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that it affects bytes.

@skirpichev
Copy link
Member

This is a breaking change. Example.

Docs says:

Called by bytes to compute a byte-string representation of an object. This should return a bytes object. The object class itself does not provide this method.

If b'X' is a byte-string representation of b'a' - you are, probably, correct. Otherwise it's just an example, that you could break something, by overriding dunder methods in subclasses. Say,

>>> class int2(int):
...     def __float__(self):
...         return 3.14
...         
>>> float(int2(123))
3.14

@sobolevn
Copy link
Member

sobolevn commented Apr 6, 2025

This is an example that the method resolution order changes. It now ignores custom __bytes__ method, I don't think that changing __bytes__ method on bytes subclass is an artificial example. I am pretty sure that people use that in the wild.

The reverse logic is true: PR's author must prove that it does not break things.

@skirpichev
Copy link
Member

It now ignores custom __bytes__ method

I would say that if you expose something different via buffer protocol and __bytes__ dunder - it's your fault, isn't? (Though, this constraint isn't documented explicitly.) Just as we, probably, could assume that float(int2(123)) = float(int(int2(123))).

@python-cla-bot
Copy link

python-cla-bot bot commented Apr 6, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 6, 2025

I'll see if I can make a largely performance neutral version that checks __bytes__ before using buffer protocol. The potential disconnect between __bytes__() and __buffer__() concerns me, feels like a source of easy to code hard to detect until they show up somewhere that's a problem bugs... Wondering if there's an efficient way to say something like "If __bytes__() is set, __buffer__() should be cleared (or defaulted to memoryview(__bytes__())?

>>> class distinct_bytes_buffer(bytes):
...     def __bytes__(self):
...         return b'b'
... 
...     def __buffer__(self, flags):
...         return memoryview(b'c')
... 
... 
... class same_bytes_buffer(bytes):
...     def __bytes__(self):
...         return b'b'
... 
...     def __buffer__(self, flags):
...         return memoryview(b'b')
... 
>>> int.from_bytes(distinct_bytes_buffer(b'a'))
... 
99
>>> int.from_bytes(same_bytes_buffer(b'a'))
... 
98
>>> int.from_bytes(b'a')
... 
97
>>> int.from_bytes(b'b')
... 
98
>>> int.from_bytes(b'c')
... 
99

@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 6, 2025

Some back pieces for reference: __bytes__ was added to bytes() in 3.11 #68422 while type hints were being worked on. The buffer protocol was before that in 3.0 (pep-3118).

@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 6, 2025

Another edge case around these, __bytes__() is only used once and must return only a object that inherits from bytes, on which PyBytes_AsString is used that returns the ob_sval inline storage value. If bytes internal storage ob_sval, __buffer__(), and __bytes__() vary then all three do sometimes get returned. I think it would be interesting to normalize to a specific behavior (straw man: always __buffer__() first), but that definitely isn't the case today (And suspect would take a larger proposal / PEP to change?).

>>> class my_bytes(bytes):
...     def __bytes__(self):
...        return b"bytes"
... 
...     def __buffer__(self, flags):
...         return memoryview(b"buffer")
... 
... class distinct_bytes_buffer(bytes):
...     def __bytes__(self):
...         return my_bytes(b"ob_sval")
... 
...     def __buffer__(self, flags):
...         return memoryview(b"distinct_buffer")
... 
... a = distinct_bytes_buffer(b"distinct_ob_sval")
... bytes(a)
... 
b'ob_sval'

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 1 to 2
Speed up :meth:`int.from_bytes` when passed a bytes-like object such as a
:class:`bytearray`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that it affects bytes.

Co-authored-by: Sergey B Kirpichev <[email protected]>
@cmaloney
Copy link
Contributor Author

cmaloney commented Apr 8, 2025

Created a branch which matches resolution order of PyObject_Bytes which gives a small performance improvement (~2%, avoids touching reference count) in the common from exact bytes case, keeps most the improvement for bytearray.

branch matching PyObject_Bytes order:

from_bytes_flags: Mean +- std dev: 27.3 ns +- 0.7 ns
bench_convert[bytes]: Mean +- std dev: 47.7 ns +- 0.4 ns
bench_convert[bytearray]: Mean +- std dev: 54.1 ns +- 0.9 ns

So bytearray goes from 64.7 ns +- 0.9 ns (main) to 54.1 ns +- 0.9 ns with change.

@sobolevn's example now returns the same value both before and after:

>>> class X(bytes):
...     def __bytes__(self):
...         return b'X'
... 
... int.from_bytes(X(b'a'))
... 
88

Should I incorporate here? (cc: @serhiy-storchaka, @sobolevn, @skirpichev)


full diff from main: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/bytes_first?collapse=1

diff from PR: cmaloney@189f219

@skirpichev
Copy link
Member

Also docs says: "The argument bytes must either be a bytes-like object or an iterable producing bytes." Something is wrong: either implementation (in the main) or docs.

@serhiy-storchaka
Copy link
Member

It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).

@skirpichev
Copy link
Member

It may be an iterable producing bytes (not the bytes objects, but integers in the range 0 to 255).

Yes, this part of the sentence might be at least not clear. But I meant the first part, which has a reference to the glossary term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants