Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems #670

Open
Fidget-Spinner opened this issue Mar 30, 2024 · 3 comments

Comments

@Fidget-Spinner
Copy link
Collaborator

Fidget-Spinner commented Mar 30, 2024

Currently we're not doing constant and float propagation on the tier 2 optimizer in main because we don't want to create new objects or hold owned references in the trace.

I propose a way to circumvent this and do constant and float propagation. I will outline the design here. The idea is to pack the raw values into the operand itself.

We then introduce two new instructions:

LOAD_INT
LOAD_FLOAT

Which constructs a value from an immediate in the operand by calling PyLong_FromLong and PyFloat_FromDouble.

To keep the optimizer peepholes simple, we then introduce variants of the above instructions

POP_TWO_LOAD_INT
POP_TWO_LOAD_FLOAT

Thus the binary operations get constant propagated to the following instructions:

BINARY_OP_ADD/SUB/MUL_INT -> POP_TWO_LOAD_INT
BINARY_OP_ADD/SUB/MUL_FLOAT -> POP_TWO_LOAD_FLOAT

Which will then be cleaned up in a peephole pass as follows:
Before:

LOAD_FAST x
LOAD_FAST y
POP_TWO_LOAD_INT
LOAD_FAST z
POP_TWO_LOAD_INT

After

LOAD_INT (x + y + z)

Here's what must happen for this to be a win:
For floats:

  • The cost of loading two values from locals/registers, writing them to stack, adding dereferenced doubles, then decref specialized must be high. Note that because of the Py_REFCNT(lhs) == 1 trick that we use to reuse floats, I am not super optimistic about this being a big win for small runs of constants. So for now, this should not be prioritised. My own experiments with the tier 2 interpreter back then when I did the lazy block versioning experiment suggests that floats are not that expensive due to the float reuse. In fact, we might slow things down because PyFloat_FromDouble always has to allocate.

For ints:

  • This is almost always guaranteed to be faster if we can constant propagate. No refcounting tricks here for ints, and the cost of a boxed int is actually very high in comparison to floats. Ints always allocate unless they are from the small ints pool, in which case we still save actually going through the _PyLong_Add subroutine, which is extremely expensive as it assumes an array of long. Note that in the specialized interpreter, float addition is the machine double addition, while int addition is an expensive subroutine. We should prioritize this.
@Fidget-Spinner Fidget-Spinner changed the title Tier 2 optimizer: Constant and Float propagation, without the reference problems Tier 2 optimizer: Constant and Float propagation, without the refcount problems Mar 30, 2024
@Fidget-Spinner Fidget-Spinner changed the title Tier 2 optimizer: Constant and Float propagation, without the refcount problems Tier 2 optimizer: Constant Int and Float propagation, without the refcount problems Mar 30, 2024
@Fidget-Spinner
Copy link
Collaborator Author

PR open at python/cpython#117396

@markshannon
Copy link
Member

We should prioritize this.

Why? do you have evidence that this will make a large enough difference to prioritize over other work?
I'm not saying we shouldn't do this. I'm just a bit skeptical that it will make much of a difference.

@Fidget-Spinner
Copy link
Collaborator Author

We should prioritize this.

Why? do you have evidence that this will make a large enough difference to prioritize over other work?

I'm not saying we shouldn't do this. I'm just a bit skeptical that it will make much of a difference.

I don't. I meant that we should prioritize ints over floats initially because my past experiments with the tier 2 interpreter suggest float operations are quite cheap. I dont think any of this will show up on a benchmark necessarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants