Skip to content

Conversation

remyoudompheng
Copy link
Contributor

@remyoudompheng remyoudompheng commented Sep 7, 2025

I propose to skip the costly integer conversion to use FLINT existing conversion functions.
Currently for large inputs it raises OverflowError (rather than returning infinity), this behaviour is preserved by using math.ldexp

After the patch there is a slight behaviour change: conversion of 2**1024-1 now succeeds and returns sys.float_info.max instead of raising OverflowError. Previously the largest convertible integer was 2**1024-2**(1024-54)-1

Update: if the number is known to be small, using fmpz_get_d directly is even faster (even faster than conversion of Python int)

Performance

In [5]: x = 2**1024-2**(1024-53)
In [6]: n = flint.fmpz(x)
In [7]: %timeit float(x)
39.6 ns ± 0.0977 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# python-flint 0.8.0
In [8]: %timeit float(n)
557 ns ± 2.71 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# this PR
In [22]: %timeit float(n)
81.3 ns ± 0.563 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [12]: x = random.getrandbits(1020)
In [13]: n = flint.fmpz(x)
In [14]: %timeit float(x)
31.4 ns ± 0.137 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
# This PR
In [15]: %timeit float(n)
25.7 ns ± 0.0268 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

@oscarbenjamin
Copy link
Collaborator

This changes the returned value:

$ cat float.py 
import random
from flint import fmpz

for _ in range(100):
    x = random.getrandbits(1020)
    assert float(fmpz(x)) == float(x), x

$ spin run python float.py 
 $ meson compile -C build
 $ meson install --only-changed -C build --destdir ../build-install
Traceback (most recent call last):
  File "/stuff/current/active/python-flint/float.py", line 6, in <module>
    assert float(fmpz(x)) == float(x), x
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 7032762630615997075984776923333935086304805039016677690569260912478663353622423395407083820822744199783856237891740932258878466924940393236854755908411876363285950823905383993344868600629929050010379573658259044905557178310577704148548523391522191512473418439519460923094265091876624476180133575388198464166

I think this is because fmpz_get_d rounds downwards whereas float(int) rounds to nearest.

I don't see that clearly documented though. Here it only says:

Otherwise, if the argument is an integer or a floating-point number, a floating-point number with the same value (within Python’s floating-point precision) is returned. If the argument is outside the range of a Python float, an OverflowError will be raised.

The CPython code for it is here with a comment saying that it is round-half-even:

/* Get a C double from an int object.  Rounds to the nearest double,
   using the round-half-to-even rule in the case of a tie. */

double
PyLong_AsDouble(PyObject *v)
{
    int64_t exponent;
    double x;

    if (v == NULL) {
        PyErr_BadInternalCall();
        return -1.0;
    }
    if (!PyLong_Check(v)) {
        PyErr_SetString(PyExc_TypeError, "an integer is required");
        return -1.0;
    }
    if (_PyLong_IsCompact((PyLongObject *)v)) {
        /* Fast path; single digit long (31 bits) will cast safely
           to double.  This improves performance of FP/long operations
           by 20%.
        */
        return (double)medium_value((PyLongObject *)v);
    }
    x = _PyLong_Frexp((PyLongObject *)v, &exponent);
    assert(exponent >= 0);
    assert(!PyErr_Occurred());
    if (exponent > DBL_MAX_EXP) {
        PyErr_SetString(PyExc_OverflowError,
                        "int too large to convert to float");
        return -1.0;
    }
    return ldexp(x, (int)exponent);
}

The function _PyLong_Frexp above looks quite complicated. A simple option could be to figure out for what range of values fmpz_get_d gives the correct result and delegate to float(int(x)) otherwise.

@remyoudompheng
Copy link
Contributor Author

Oh, this is sad. Round to nearest with half-even convention is definitely a must, I will check what can be done.

@oscarbenjamin
Copy link
Collaborator

It looks like gmpy2 does not use round-half-even:

$ cat float.py 
import random
from gmpy2 import mpz

for _ in range(100):
    x = random.getrandbits(54)
    assert float(mpz(x)) == float(x), x
$ python float.py 
Traceback (most recent call last):
  File "/stuff/current/active/python-flint/float.py", line 6, in <module>
    assert float(mpz(x)) == float(x), x
           ^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 10741429198758067

@remyoudompheng
Copy link
Contributor Author

Commit 9382a42 implements correct rounding (round to nearest + round-half-to-even)

It is enough to test an interval of consecutive numbers to check the convention

for x in range(2**64, 2**64 + 1000000):
    assert float(x) == float(flint.fmpz(x))
    assert float(-x) == float(flint.fmpz(-x))

@oscarbenjamin
Copy link
Collaborator

There was a test failure under PyPy:

File "/Users/runner/hostedtoolcache/PyPy/3.11.13/x64/lib/pypy3.11/site-packages/flint/types/fq_default.pypy311-pp73-darwin.so", line 270, in flint.types.fq_default.fq_default_ctx.gen
Failed example:
    gf.gen()
Expected:
    w
Got:
    <BLANKLINE>

The change here does not look like it could cause that so maybe there is something undefined here.

@oscarbenjamin
Copy link
Collaborator

It is enough to test an interval of consecutive numbers to check the convention

for x in range(2**64, 2**64 + 1000000):
    assert float(x) == float(flint.fmpz(x))
    assert float(-x) == float(flint.fmpz(-x))

Could you add this to the tests (with fewer iterations)?

It would be good to ensure that the tests cover every possible branch of the __float__ method.

@oscarbenjamin
Copy link
Collaborator

Timing with main I get:

In [2]: x = 2**64 + 1

In [3]: n = flint.fmpz(x)

In [4]: %timeit float(x)
88.4 ns ± 0.241 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: %timeit float(n)
938 ns ± 12 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [6]: x = 123

In [7]: n = flint.fmpz(x)

In [8]: %timeit float(x)
55.9 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit float(n)
66.5 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So for larger integers converting to int is definitely slow. However for smaller integers it is fast and we benefit from CPython's fast int -> float conversion. Repeating the same with the PR I get:

In [6]: x = 2**64 + 1

In [7]: n = flint.fmpz(x)

In [8]: %timeit float(x)
88.7 ns ± 0.505 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [9]: %timeit float(n)
94.5 ns ± 0.383 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [10]: x = 123

In [11]: n = flint.fmpz(x)

In [12]: %timeit float(x)
55.8 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [13]: %timeit float(n)
71.8 ns ± 0.527 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

So for larger integers this is a lot faster i.e. 10x for an integer slightly larger than 2**64. I imagine that most of the cost is due to the fmpz to int conversion itself being slow (I think this uses the hex-string path).

For the smaller integer though it is seems possibly slower on this measurement although the difference seen is close to margin of error. I wonder if it is somehow possible to have an even faster path using COEFF_IS_MPZ and then just directly casting from slong to double. It looks like that is what fmpz_get_d already does but maybe the other things like fmpz_val2 etc are adding overhead in that case.

@oscarbenjamin
Copy link
Collaborator

Also this should have a release note.

@remyoudompheng
Copy link
Contributor Author

In the case of small integers I believe the largest overhead comes from the Python int allocated in fmpz_get_intlong. Casting directly to slong and double avoids this extra Python object.

I refactored slightly, also to avoid the performance hit for negative numbers.

Python int (baseline)

In [3]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = e
   ...:     %timeit float(x)
   ...:     x = -e
   ...:     %timeit float(x)
29.6 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.5 ns ± 0.259 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.539 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.7 ns ± 0.332 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.9 ns ± 0.414 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
31.6 ns ± 0.186 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
32.2 ns ± 0.299 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

python-flint 0.8.0

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
42.8 ns ± 0.596 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
42.9 ns ± 0.44 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
202 ns ± 1.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
220 ns ± 3.72 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
172 ns ± 0.348 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
188 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
175 ns ± 0.843 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
190 ns ± 0.878 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

commit 91dc079

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
   ...:     assert float(x) == float(-e)
23.6 ns ± 0.264 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
25.7 ns ± 0.254 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.7 ns ± 0.617 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.3 ns ± 0.477 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.8 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
47.1 ns ± 0.441 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.9 ns ± 0.357 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
48.9 ns ± 0.165 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

commit 6a5370c (new)

In [2]: for e in (3**20, 3**100, 2**63+1, 2**64+1):
   ...:     x = flint.fmpz(e)
   ...:     %timeit float(x)
   ...:     x = flint.fmpz(-e)
   ...:     %timeit float(x)
   ...:     assert float(x) == float(-e)
21.6 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
21.2 ns ± 0.338 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.3 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29.1 ns ± 0.591 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
27.9 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
29 ns ± 0.288 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28 ns ± 0.324 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
28.6 ns ± 0.435 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Performance in latest version of the PR is more consistently faster than Python ints.

@oscarbenjamin
Copy link
Collaborator

Performance in latest version of the PR is more consistently faster than Python ints.

Very nice.

Comment on lines 108 to +110
def __float__(self):
if not COEFF_IS_MPZ(self.val[0]):
return <double>(<slong>self.val[0])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the C23 standard text:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf

In 6.3.1.4 it says:

When a value of integer type is converted to a standard floating type, if the value being converted
can be represented exactly in the new type, it is unchanged. If the value being converted is in the
range of values that can be represented but cannot be represented exactly, the result is either the
nearest higher or nearest lower representable value, chosen in an implementation-defined manner.
If the value being converted is outside the range of values that can be represented, the behavior is
undefined.

In our case here I think that in a 32 bit build this is fine since every 32 bit integer can go directly to a double. For a 64 bit slong values greater than 2^53 here cannot be represented exactly and then the result would be implementation-defined so it's kind of unsafe even if it seems to work under testing right now.

I think it is best to do here what FLINT's fmpz_get_d does which is to check if the result is in the range [-2^53, 2^53] before casting to double and otherwise fall back on the other path.

That check is unnecessary in the 32-bit case though. We probably should define a function for this using preprocessor defines somewhere near here:

cdef extern from *:
"""
/* FLINT_BITS is not known until C compile time. We need to check if long
* or long long matches FLINT_BITS to know which CPython function to call.
*/
#if FLINT_BITS == 32 && LONG_MAX == 2147483647
#define pylong_as_slong PyLong_AsLongAndOverflow
#elif FLINT_BITS == 64 && LLONG_MAX == 9223372036854775807
#define pylong_as_slong PyLong_AsLongLongAndOverflow
#else
#error FLINT_BITS does not match width of long or long long.
#endif
"""
slong pylong_as_slong(PyObject *pylong, int *overflow)

I think we want a macro something like

#if sizeof(slong) == 4
   #define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x)
#elif sizeof(slong) == 8
   /* x in [-2^53, 2^53] */
   #define _fmpz_can_cast_double(x) MPZ_IS_COEFF(x) \
        && -9007199254740992 <= (slong)(x) \
        && (slong)(x) <= 9007199254740992
#endif

I guess checking sizeof(slong) is what should really be used for pylong_as_slong as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhere near here:

Actually it probably makes more sense to put the macro in fmpz.pyx where it is used. If other code wants to use it in future then we can consider putting it somewhere more central.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants