More efficient conversion of fmpz to float #323

remyoudompheng · 2025-09-07T08:20:50Z

I propose to skip the costly integer conversion to use FLINT existing conversion functions.
Currently for large inputs it raises OverflowError (rather than returning infinity), this behaviour is preserved by using math.ldexp

After the patch there is a slight behaviour change: conversion of 2**1024-1 now succeeds and returns sys.float_info.max instead of raising OverflowError. Previously the largest convertible integer was 2**1024-2**(1024-54)-1

Update: if the number is known to be small, using fmpz_get_d directly is even faster (even faster than conversion of Python int)

Performance

In [5]: x = 2**1024-2**(1024-53)
In [6]: n = flint.fmpz(x)
In [7]: %timeit float(x)
39.6 ns ± 0.0977 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# python-flint 0.8.0
In [8]: %timeit float(n)
557 ns ± 2.71 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# this PR
In [22]: %timeit float(n)
81.3 ns ± 0.563 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [12]: x = random.getrandbits(1020)
In [13]: n = flint.fmpz(x)
In [14]: %timeit float(x)
31.4 ns ± 0.137 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# This PR
In [15]: %timeit float(n)
25.7 ns ± 0.0268 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

oscarbenjamin · 2025-09-07T13:31:02Z

This changes the returned value:

$ cat float.py 
import random
from flint import fmpz

for _ in range(100):
    x = random.getrandbits(1020)
    assert float(fmpz(x)) == float(x), x

$ spin run python float.py 
 $ meson compile -C build
 $ meson install --only-changed -C build --destdir ../build-install
Traceback (most recent call last):
  File "/stuff/current/active/python-flint/float.py", line 6, in <module>
    assert float(fmpz(x)) == float(x), x
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 7032762630615997075984776923333935086304805039016677690569260912478663353622423395407083820822744199783856237891740932258878466924940393236854755908411876363285950823905383993344868600629929050010379573658259044905557178310577704148548523391522191512473418439519460923094265091876624476180133575388198464166

I think this is because fmpz_get_d rounds downwards whereas float(int) rounds to nearest.

I don't see that clearly documented though. Here it only says:

Otherwise, if the argument is an integer or a floating-point number, a floating-point number with the same value (within Python’s floating-point precision) is returned. If the argument is outside the range of a Python float, an OverflowError will be raised.

The CPython code for it is here with a comment saying that it is round-half-even:

/* Get a C double from an int object.  Rounds to the nearest double,
   using the round-half-to-even rule in the case of a tie. */

double
PyLong_AsDouble(PyObject *v)
{
    int64_t exponent;
    double x;

    if (v == NULL) {
        PyErr_BadInternalCall();
        return -1.0;
    }
    if (!PyLong_Check(v)) {
        PyErr_SetString(PyExc_TypeError, "an integer is required");
        return -1.0;
    }
    if (_PyLong_IsCompact((PyLongObject *)v)) {
        /* Fast path; single digit long (31 bits) will cast safely
           to double.  This improves performance of FP/long operations
           by 20%.
        */
        return (double)medium_value((PyLongObject *)v);
    }
    x = _PyLong_Frexp((PyLongObject *)v, &exponent);
    assert(exponent >= 0);
    assert(!PyErr_Occurred());
    if (exponent > DBL_MAX_EXP) {
        PyErr_SetString(PyExc_OverflowError,
                        "int too large to convert to float");
        return -1.0;
    }
    return ldexp(x, (int)exponent);
}

The function _PyLong_Frexp above looks quite complicated. A simple option could be to figure out for what range of values fmpz_get_d gives the correct result and delegate to float(int(x)) otherwise.

remyoudompheng · 2025-09-07T13:38:06Z

Oh, this is sad. Round to nearest with half-even convention is definitely a must, I will check what can be done.

oscarbenjamin · 2025-09-07T13:59:38Z

It looks like gmpy2 does not use round-half-even:

$ cat float.py 
import random
from gmpy2 import mpz

for _ in range(100):
    x = random.getrandbits(54)
    assert float(mpz(x)) == float(x), x
$ python float.py 
Traceback (most recent call last):
  File "/stuff/current/active/python-flint/float.py", line 6, in <module>
    assert float(mpz(x)) == float(x), x
           ^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 10741429198758067

remyoudompheng · 2025-09-07T14:22:28Z

Commit 9382a42 implements correct rounding (round to nearest + round-half-to-even)

It is enough to test an interval of consecutive numbers to check the convention

for x in range(2**64, 2**64 + 1000000):
    assert float(x) == float(flint.fmpz(x))
    assert float(-x) == float(flint.fmpz(-x))

oscarbenjamin · 2025-09-07T15:04:35Z

There was a test failure under PyPy:

File "/Users/runner/hostedtoolcache/PyPy/3.11.13/x64/lib/pypy3.11/site-packages/flint/types/fq_default.pypy311-pp73-darwin.so", line 270, in flint.types.fq_default.fq_default_ctx.gen
Failed example:
    gf.gen()
Expected:
    w
Got:
    <BLANKLINE>

The change here does not look like it could cause that so maybe there is something undefined here.

oscarbenjamin · 2025-09-07T15:17:17Z

It is enough to test an interval of consecutive numbers to check the convention
for x in range(2**64, 2**64 + 1000000):
    assert float(x) == float(flint.fmpz(x))
    assert float(-x) == float(flint.fmpz(-x))

Could you add this to the tests (with fewer iterations)?

It would be good to ensure that the tests cover every possible branch of the __float__ method.

oscarbenjamin · 2025-09-07T17:21:38Z

Timing with main I get:

In [2]: x = 2**64 + 1

In [3]: n = flint.fmpz(x)

In [4]: %timeit float(x)
88.4 ns ± 0.241 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: %timeit float(n)
938 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [6]: x = 123

In [7]: n = flint.fmpz(x)

In [8]: %timeit float(x)
55.9 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit float(n)
66.5 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So for larger integers converting to int is definitely slow. However for smaller integers it is fast and we benefit from CPython's fast int -> float conversion. Repeating the same with the PR I get:

In [6]: x = 2**64 + 1

In [7]: n = flint.fmpz(x)

In [8]: %timeit float(x)
88.7 ns ± 0.505 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit float(n)
94.5 ns ± 0.383 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [10]: x = 123

In [11]: n = flint.fmpz(x)

In [12]: %timeit float(x)
55.8 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [13]: %timeit float(n)
71.8 ns ± 0.527 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So for larger integers this is a lot faster i.e. 10x for an integer slightly larger than 2**64. I imagine that most of the cost is due to the fmpz to int conversion itself being slow (I think this uses the hex-string path).

For the smaller integer though it is seems possibly slower on this measurement although the difference seen is close to margin of error. I wonder if it is somehow possible to have an even faster path using COEFF_IS_MPZ and then just directly casting from slong to double. It looks like that is what fmpz_get_d already does but maybe the other things like fmpz_val2 etc are adding overhead in that case.

oscarbenjamin · 2025-09-07T17:36:13Z

Also this should have a release note.

remyoudompheng · 2025-09-07T20:32:21Z

In the case of small integers I believe the largest overhead comes from the Python int allocated in fmpz_get_intlong. Casting directly to slong and double avoids this extra Python object.

I refactored slightly, also to avoid the performance hit for negative numbers.

Python int (baseline)

In [3]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = e
   ...:     %timeit float(x)
   ...:     x = -e
   ...:     %timeit float(x)
29.6 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.5 ns ± 0.259 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.539 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.7 ns ± 0.332 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.6 ns ± 0.186 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

python-flint 0.8.0

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
42.8 ns ± 0.596 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
42.9 ns ± 0.44 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
202 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
220 ns ± 3.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
172 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
188 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
175 ns ± 0.843 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
190 ns ± 0.878 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

commit 91dc079

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
   ...:     assert float(x) == float(-e)
23.6 ns ± 0.264 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
25.7 ns ± 0.254 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.7 ns ± 0.617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.3 ns ± 0.477 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.8 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
47.1 ns ± 0.441 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.9 ns ± 0.357 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.9 ns ± 0.165 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

commit 6a5370c (new)

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
   ...:     assert float(x) == float(-e)
21.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
21.2 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.3 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.1 ns ± 0.591 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
27.9 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29 ns ± 0.288 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.324 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.6 ns ± 0.435 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Performance in latest version of the PR is more consistently faster than Python ints.

oscarbenjamin · 2025-09-07T21:25:12Z

Performance in latest version of the PR is more consistently faster than Python ints.

Very nice.

oscarbenjamin · 2025-09-07T22:17:10Z

src/flint/types/fmpz.pyx

    def __float__(self):
+        if not COEFF_IS_MPZ(self.val[0]):
+            return <double>(<slong>self.val[0])


Looking at the C23 standard text:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

In 6.3.1.4 it says:

When a value of integer type is converted to a standard floating type, if the value being converted
can be represented exactly in the new type, it is unchanged. If the value being converted is in the
range of values that can be represented but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in an implementation-defined manner.
If the value being converted is outside the range of values that can be represented, the behavior is
undefined.

In our case here I think that in a 32 bit build this is fine since every 32 bit integer can go directly to a double. For a 64 bit slong values greater than 2^53 here cannot be represented exactly and then the result would be implementation-defined so it's kind of unsafe even if it seems to work under testing right now.

I think it is best to do here what FLINT's fmpz_get_d does which is to check if the result is in the range [-2^53, 2^53] before casting to double and otherwise fall back on the other path.

That check is unnecessary in the 32-bit case though. We probably should define a function for this using preprocessor defines somewhere near here:

python-flint/src/flint/flintlib/types/flint.pxd

Lines 95 to 108 in 0a59f53

cdef extern from *:

"""

/* FLINT_BITS is not known until C compile time. We need to check if long

* or long long matches FLINT_BITS to know which CPython function to call.

*/

#if FLINT_BITS == 32 && LONG_MAX == 2147483647

#define pylong_as_slong PyLong_AsLongAndOverflow

#elif FLINT_BITS == 64 && LLONG_MAX == 9223372036854775807

#define pylong_as_slong PyLong_AsLongLongAndOverflow

#else

#error FLINT_BITS does not match width of long or long long.

#endif

"""

slong pylong_as_slong(PyObject *pylong, int *overflow)

I think we want a macro something like

#if sizeof(slong) == 4 #define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x) #elif sizeof(slong) == 8 /* x in [-2^53, 2^53] */ #define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x) \ && -9007199254740992 <= (slong)(x) \ && (slong)(x) <= 9007199254740992 #endif

I guess checking sizeof(slong) is what should really be used for pylong_as_slong as well.

somewhere near here:

Actually it probably makes more sense to put the macro in fmpz.pyx where it is used. If other code wants to use it in future then we can consider putting it somewhere more central.

remyoudompheng added 2 commits September 7, 2025 15:40

More efficient conversion of fmpz to float

4da6b11

Add fast path for float conversion of numbers below 1023 bits

b3656ca

Implement round-to-even compliant rounding

9382a42

remyoudompheng force-pushed the doubleconv branch from d79ff6e to 9382a42 Compare September 7, 2025 14:20

oscarbenjamin mentioned this pull request Sep 7, 2025

There was a test failure under PyPy: #325

Closed

Additional fmpz to float conversion tests

91dc079

remyoudompheng added 2 commits September 7, 2025 22:30

Improve conversion of small or negative fmpz to float

5896ada

Update release notes

6a5370c

oscarbenjamin reviewed Sep 7, 2025

View reviewed changes

	cdef extern from *:
	"""
	/* FLINT_BITS is not known until C compile time. We need to check if long
	* or long long matches FLINT_BITS to know which CPython function to call.
	*/
	#if FLINT_BITS == 32 && LONG_MAX == 2147483647
	#define pylong_as_slong PyLong_AsLongAndOverflow
	#elif FLINT_BITS == 64 && LLONG_MAX == 9223372036854775807
	#define pylong_as_slong PyLong_AsLongLongAndOverflow
	#else
	#error FLINT_BITS does not match width of long or long long.
	#endif
	"""
	slong pylong_as_slong(PyObject pylong, int overflow)

More efficient conversion of fmpz to float #323

Are you sure you want to change the base?

More efficient conversion of fmpz to float #323

Uh oh!

Conversation

remyoudompheng commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

remyoudompheng commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

remyoudompheng commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

remyoudompheng commented Sep 7, 2025

Uh oh!

oscarbenjamin commented Sep 7, 2025

Uh oh!

oscarbenjamin Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

oscarbenjamin Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

remyoudompheng commented Sep 7, 2025 •

edited

Loading