Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems

Currently we're not doing constant and float propagation on the tier 2 optimizer in main because we don't want to create new objects or hold owned references in the trace.

I propose a way to circumvent this *and* do constant and float propagation. I will outline the design here. The idea is to pack the raw values into the operand itself.

We then introduce two new instructions:
```
LOAD_INT
LOAD_FLOAT
```
Which constructs a value from an immediate in the operand by calling `PyLong_FromLong` and `PyFloat_FromDouble`.

To keep the optimizer peepholes simple, we then introduce variants of the above instructions
```
POP_TWO_LOAD_INT
POP_TWO_LOAD_FLOAT
```

Thus the binary operations get constant propagated to the following instructions:
```
BINARY_OP_ADD/SUB/MUL_INT -> POP_TWO_LOAD_INT
BINARY_OP_ADD/SUB/MUL_FLOAT -> POP_TWO_LOAD_FLOAT
```

Which will then be cleaned up in a peephole pass as follows:
Before:
```
LOAD_FAST x
LOAD_FAST y
POP_TWO_LOAD_INT
LOAD_FAST z
POP_TWO_LOAD_INT
```
After
```
LOAD_INT (x + y + z)
```

Here's what must happen for this to be a win:
For floats:
- The cost of loading two values from locals/registers, writing them to stack, adding dereferenced doubles, then decref specialized must be high. Note that because of the `Py_REFCNT(lhs) == 1` trick that we use to reuse floats, I am not super optimistic about this being a big win for small runs of constants. So for now, this should not be prioritised. My own experiments with the tier 2 interpreter back then when I did the lazy block versioning experiment suggests that floats are not that expensive due to the float reuse. In fact, we might slow things down because `PyFloat_FromDouble` always has to allocate.

For ints:
- This is almost always guaranteed to be faster if we can constant propagate. No refcounting tricks here for ints, and the cost of a boxed int is actually very high in comparison to floats. Ints always allocate unless they are from the small ints pool, in which case we still save actually going through the `_PyLong_Add` subroutine, which is extremely expensive as it assumes an array of `long`. Note that in the specialized interpreter, float addition is the machine `double` addition, while int addition is an expensive subroutine. **We should prioritize this.**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems #670

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems #670

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions