-
Notifications
You must be signed in to change notification settings - Fork 33
More efficient conversion of fmpz to float #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This changes the returned value: $ cat float.py
import random
from flint import fmpz
for _ in range(100):
x = random.getrandbits(1020)
assert float(fmpz(x)) == float(x), x
$ spin run python float.py
$ meson compile -C build
$ meson install --only-changed -C build --destdir ../build-install
Traceback (most recent call last):
File "/stuff/current/active/python-flint/float.py", line 6, in <module>
assert float(fmpz(x)) == float(x), x
^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 7032762630615997075984776923333935086304805039016677690569260912478663353622423395407083820822744199783856237891740932258878466924940393236854755908411876363285950823905383993344868600629929050010379573658259044905557178310577704148548523391522191512473418439519460923094265091876624476180133575388198464166 I think this is because I don't see that clearly documented though. Here it only says:
The CPython code for it is here with a comment saying that it is round-half-even: /* Get a C double from an int object. Rounds to the nearest double,
using the round-half-to-even rule in the case of a tie. */
double
PyLong_AsDouble(PyObject *v)
{
int64_t exponent;
double x;
if (v == NULL) {
PyErr_BadInternalCall();
return -1.0;
}
if (!PyLong_Check(v)) {
PyErr_SetString(PyExc_TypeError, "an integer is required");
return -1.0;
}
if (_PyLong_IsCompact((PyLongObject *)v)) {
/* Fast path; single digit long (31 bits) will cast safely
to double. This improves performance of FP/long operations
by 20%.
*/
return (double)medium_value((PyLongObject *)v);
}
x = _PyLong_Frexp((PyLongObject *)v, &exponent);
assert(exponent >= 0);
assert(!PyErr_Occurred());
if (exponent > DBL_MAX_EXP) {
PyErr_SetString(PyExc_OverflowError,
"int too large to convert to float");
return -1.0;
}
return ldexp(x, (int)exponent);
} The function |
Oh, this is sad. Round to nearest with half-even convention is definitely a must, I will check what can be done. |
It looks like gmpy2 does not use round-half-even: $ cat float.py
import random
from gmpy2 import mpz
for _ in range(100):
x = random.getrandbits(54)
assert float(mpz(x)) == float(x), x
$ python float.py
Traceback (most recent call last):
File "/stuff/current/active/python-flint/float.py", line 6, in <module>
assert float(mpz(x)) == float(x), x
^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 10741429198758067 |
d79ff6e
to
9382a42
Compare
Commit 9382a42 implements correct rounding (round to nearest + round-half-to-even) It is enough to test an interval of consecutive numbers to check the convention for x in range(2**64, 2**64 + 1000000):
assert float(x) == float(flint.fmpz(x))
assert float(-x) == float(flint.fmpz(-x)) |
There was a test failure under PyPy:
The change here does not look like it could cause that so maybe there is something undefined here. |
Could you add this to the tests (with fewer iterations)? It would be good to ensure that the tests cover every possible branch of the |
Timing with main I get: In [2]: x = 2**64 + 1
In [3]: n = flint.fmpz(x)
In [4]: %timeit float(x)
88.4 ns ± 0.241 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [5]: %timeit float(n)
938 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [6]: x = 123
In [7]: n = flint.fmpz(x)
In [8]: %timeit float(x)
55.9 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit float(n)
66.5 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So for larger integers converting to int is definitely slow. However for smaller integers it is fast and we benefit from CPython's fast In [6]: x = 2**64 + 1
In [7]: n = flint.fmpz(x)
In [8]: %timeit float(x)
88.7 ns ± 0.505 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [9]: %timeit float(n)
94.5 ns ± 0.383 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [10]: x = 123
In [11]: n = flint.fmpz(x)
In [12]: %timeit float(x)
55.8 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
In [13]: %timeit float(n)
71.8 ns ± 0.527 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) So for larger integers this is a lot faster i.e. 10x for an integer slightly larger than For the smaller integer though it is seems possibly slower on this measurement although the difference seen is close to margin of error. I wonder if it is somehow possible to have an even faster path using |
Also this should have a release note. |
In the case of small integers I believe the largest overhead comes from the Python I refactored slightly, also to avoid the performance hit for negative numbers. Python int (baseline) In [3]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = e
...: %timeit float(x)
...: x = -e
...: %timeit float(x)
29.6 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.5 ns ± 0.259 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.539 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.7 ns ± 0.332 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.6 ns ± 0.186 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) python-flint 0.8.0 In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
42.8 ns ± 0.596 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
42.9 ns ± 0.44 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
202 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
220 ns ± 3.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
172 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
188 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
175 ns ± 0.843 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
190 ns ± 0.878 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) commit 91dc079 In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
...: assert float(x) == float(-e)
23.6 ns ± 0.264 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
25.7 ns ± 0.254 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.7 ns ± 0.617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.3 ns ± 0.477 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.8 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
47.1 ns ± 0.441 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.9 ns ± 0.357 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.9 ns ± 0.165 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) commit 6a5370c (new) In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
...: x = flint.fmpz(e)
...: %timeit float(x)
...: x = flint.fmpz(-e)
...: %timeit float(x)
...: assert float(x) == float(-e)
21.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
21.2 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.3 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.1 ns ± 0.591 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
27.9 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29 ns ± 0.288 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.324 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.6 ns ± 0.435 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each) Performance in latest version of the PR is more consistently faster than Python ints. |
Very nice. |
def __float__(self): | ||
if not COEFF_IS_MPZ(self.val[0]): | ||
return <double>(<slong>self.val[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the C23 standard text:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
In 6.3.1.4 it says:
When a value of integer type is converted to a standard floating type, if the value being converted
can be represented exactly in the new type, it is unchanged. If the value being converted is in the
range of values that can be represented but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in an implementation-defined manner.
If the value being converted is outside the range of values that can be represented, the behavior is
undefined.
In our case here I think that in a 32 bit build this is fine since every 32 bit integer can go directly to a double. For a 64 bit slong values greater than 2^53
here cannot be represented exactly and then the result would be implementation-defined so it's kind of unsafe even if it seems to work under testing right now.
I think it is best to do here what FLINT's fmpz_get_d
does which is to check if the result is in the range [-2^53, 2^53]
before casting to double
and otherwise fall back on the other path.
That check is unnecessary in the 32-bit case though. We probably should define a function for this using preprocessor defines somewhere near here:
python-flint/src/flint/flintlib/types/flint.pxd
Lines 95 to 108 in 0a59f53
cdef extern from *: | |
""" | |
/* FLINT_BITS is not known until C compile time. We need to check if long | |
* or long long matches FLINT_BITS to know which CPython function to call. | |
*/ | |
#if FLINT_BITS == 32 && LONG_MAX == 2147483647 | |
#define pylong_as_slong PyLong_AsLongAndOverflow | |
#elif FLINT_BITS == 64 && LLONG_MAX == 9223372036854775807 | |
#define pylong_as_slong PyLong_AsLongLongAndOverflow | |
#else | |
#error FLINT_BITS does not match width of long or long long. | |
#endif | |
""" | |
slong pylong_as_slong(PyObject *pylong, int *overflow) |
I think we want a macro something like
#if sizeof(slong) == 4
#define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x)
#elif sizeof(slong) == 8
/* x in [-2^53, 2^53] */
#define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x) \
&& -9007199254740992 <= (slong)(x) \
&& (slong)(x) <= 9007199254740992
#endif
I guess checking sizeof(slong)
is what should really be used for pylong_as_slong
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhere near here:
Actually it probably makes more sense to put the macro in fmpz.pyx where it is used. If other code wants to use it in future then we can consider putting it somewhere more central.
I propose to skip the costly integer conversion to use FLINT existing conversion functions.
Currently for large inputs it raises OverflowError (rather than returning infinity), this behaviour is preserved by using
math.ldexp
After the patch there is a slight behaviour change: conversion of
2**1024-1
now succeeds and returnssys.float_info.max
instead of raisingOverflowError
. Previously the largest convertible integer was2**1024-2**(1024-54)-1
Update: if the number is known to be small, using
fmpz_get_d
directly is even faster (even faster than conversion of Python int)Performance