Failing pipelines pass CI, find_exponent and mantissa return wrong values for long double

# Issues

There are 2 issues:

1. Failing tests don't actually cause the pipeline to fail. The [incomplete_beta](https://github.com/MitchellThompkins/gcem/actions/runs/24120200019/job/70372432208#step:5:1139) tests actually fail, but the pipeline doesn't check the status.
2. There is a subtle bug in the `gcem::find_exponent` and `gcem::mantissa` functions that results in the incorrect branch being taken when passed long double values of `0.1L` and `0.001L`. This is actually why the tests from (1) fail.

# Root Cause

`gcem::find_exponent` extracts the base-10 exponent of a floating point number. For example given 123.45, the exponent is 2, b/c in scientific notation 123.45 = 1.2345 * 10^2. The exponent of 2 is the power of 10 that puts the manitssa in [1, 10).

Given `0.1L` you would expect the exponent to be -1, because 0.1 = 1 * 10^-1. However the implementation today returns -2 for types of `long double`. The same bug exists for `0.001L`.

`gcem::find_exponent` is implemented such that it recursively seeks the decade boundary in order to scale the decimal point. It does this by comparing the value of interest `x` to a value that lies on a base-10 decade, casts that decade, and recursively branches to find the correct exponent.

Because values > 1 (e.g. 10, 100, 1000, 10000) are integers and exactly representable in any floating-point type casting them has no effect. Decade boundaries < 1 (i.e. 0.1, 0.001, 0.0001) are fractions of the form 1/10^n and thus are not exactly representable in binary floating point; they require rounding.

Boundaries < 1 are then edge cases when casting to wider types. Consider `0.1L` (a `long double`). When `0.1L` is passed into the function we end up using the `long double` templated function. The implenetation then comparse `0.1L` (an intrinsic 80-bit long double on most platforms) to a casted `long double(0.1)` (64-bit value casted to an 80-bit value). These do not result in the same value; the casted version is actually _bigger_.

When we write `0.1L` the compiler rounds the mathematical value 0.1 once to the nearest 80-bit `long double`. When we cast a `double` the compiler first rounds 0.1 to the nearest 64-bit `double` and _then_ widens the already rounded `double` to the nearest 80-bit `long double` to represent _that_ already rounded 64-bit value. The values look like this:

```sh
0.1L = 1.0000000000000000000135525E-01
static_cast<long double>(0.1) = 1.0000000000000000555111512E-01
```

The result is that any function that calls `gcem::find_exponent` with a value that lies on a decade boundary and with a type that forces it to cast to a rounded value will return the wrong exponent. This is because the casted value _barely_ falls on the wrong side of that boundary.

# Examples

The below code demonstrates the bug:

```cpp
// a.cpp
#include "include/gcem.hpp"
#include "math.h"
#include "stdio.h"

void casting_behavior()
{
    printf("\n################ Casting Behavior ################\n");

    printf("0.1L = %.25LE\n", 0.1L);
    printf("static_cast<long double>(0.1L) = %.25LE\n", static_cast<long double>((0.1)));
    printf("\n");

    printf("0.001L = %.25LE\n", 0.001L);
    printf("static_cast<long double>(0.001) = %.25LE\n", static_cast<long double>((0.001)));
}

void find_exponent_bug()
{
    printf("\n################ Finding Exponent Bug ################\n");

    printf("find_exponent(123.45L) = %lld\n", gcem::internal::find_exponent(123.45L, 0LL));

    printf("find_exponent(0.1L)  = %lld\n", gcem::internal::find_exponent(0.1L, 0LL));
    printf("find_exponent(0.001L) = %lld\n", gcem::internal::find_exponent(0.001L, 0LL));

    printf("find_exponent(0.1)  = %lld\n", gcem::internal::find_exponent(0.1, 0));
    printf("find_exponent(0.001) = %lld\n", gcem::internal::find_exponent(0.001, 0));
}

void log_bug()
{
    printf("\n################ Log bug ################\n");

    printf("std::log(0.1) = %.25E\n", std::log(0.1));
    printf("gcem::log(0.1) = %.25E\n", gcem::log(0.1));
    printf("\n");

    printf("std::log(0.1L) = %.25LE\n", std::log(0.1L));
    printf("gcem::log(0.1L) = %.25LE\n", gcem::log(0.1L));
    printf("\n");

    printf("std::log(0.001) = %.25E\n", std::log(0.001));
    printf("gcem::log(0.001) = %.25E\n", gcem::log(0.001));
    printf("\n");

    printf("std::log(0.001L) = %.25LE\n", std::log(0.001L));
    printf("gcem::log(0.001L) = %.25LE\n", gcem::log(0.001L));
    printf("\n");
}

int main()
{
    casting_behavior();
    find_exponent_bug();
    log_bug();
}

```

```sh
g++ a.cpp
./a.out
```
This results in:

```sh
################ Casting Behavior ################

0.1L = 1.0000000000000000000135525E-01
static_cast<long double>(0.1) = 1.0000000000000000555111512E-01

0.001L = 9.9999999999999999995849539E-04
static_cast<long double>(0.001L) = 1.0000000000000000208166817E-03

################ Finding Exponent Bug ################

find_exponent(123.45L) = 2
find_exponent(0.1L)  = -2
find_exponent(0.001L) = -4
find_exponent(0.1)  = -1
find_exponent(0.001) = -3

################ Log bug ################

std::log(0.1) = -2.3025850929940454570044039E+00
gcem::log(0.1) = -2.3025850929940459010936138E+00

std::log(0.1L) = -2.3025850929940456840363389E+00
gcem::log(0.1L) = -4.6051701859880913680726777E+00

std::log(0.001) = -6.9077552789821368151024217E+00
gcem::log(0.001) = -6.9077552789821377032808414E+00

std::log(0.001L) = -6.9077552789821370523258570E+00
gcem::log(0.001L) = -6.9077552789821370523258570E+00
```

These two produce wildly different and wrong answers when using `long double`.

```
find_exponent(0.1L)  = -2 <-- wrong
find_exponent(0.001L) = -4 <-- wrong
gcem::log(0.1L) = -4.6051701859880913680726777E+00 <-- wrong
```

# Other issues

The last curious point is why doesn't `gcem::log(0.001L)` suffer from the same erroneous behavior if it also calls `gcem::find_exponent`?

Well it does but it's a bit hidden (and it took me a while to realize this). `log_breakup` calls `gcem::mantissa` which _also_ suffers from the same type of rounding bug. In order to put a decimal into scientific notation, the implementation recursively iterates to drive the decimal place into the right location. If the number is less than 1, that means multiplying by 10 until it's in the right place. However, this function also casts `double` types to `long double`, and the error accumulates through different casts. For `0.001L` this means we get a mantissa that moved the decimal one place too far, resulting in a number that's 10x the expected value.

```
0.001L = 9.9999999999999999995849539E-04

step 1: 9.9999999999999999995849539E-04 * 10 = 9.9999999999999999997967121E-03  < 1.0, keep going
step 2: 9.9999999999999999997967121E-03 * 10 = 9.9999999999999999994578989E-02  < 1.0, keep going
step 3: 9.9999999999999999994578989E-02 * 10 = 9.9999999999999999994578989E-01  < 1.0, keep going
step 4: 9.9999999999999999994578989E-01 * 10 = 9.9999999999999999991326383E+00  >= 1.0, STOP  <-- one step too many
```

Then `log_breakup` in the implementation of `gcem::log` computes `log_mantissa(mantissa(x)) + log(10) * find_exponent(x, 0)`. This results in:

```
find_exponent(0.001L) = -4 (should be -3)
mantissa(0.001L)      = 9.9999999999999999991326383E+00 (should be 1.0)

log(mantissa(0.001L)) + log(10) * (find_exponent(0.001L))
log(9.999...) + log(10) * (-4)
= 2.3025850929940456840363389E+00 + -9.2103403719761827361453554E+00
= -6.9077552789821370518921761E+00
```

`gcem::mantissa` produces a value that is too large (by exactly 1 iteration), but it is compensated by the off-by-one exponent. So the two wrongs make a right when calling `gcem::log` for the boundary value `0.001L`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing pipelines pass CI, find_exponent and mantissa return wrong values for long double #59

Issues

Root Cause

Examples

Other issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Failing pipelines pass CI, find_exponent and mantissa return wrong values for long double #59

Description

Issues

Root Cause

Examples

Other issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions