-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of conditional inputs and outputs for instructions in bytecodes.c #128914
Comments
(Tagging it as a |
…and the code generators (pythonGH-128918)
…odes.c` and the code generators (pythonGH-128918)" The commit introduced a large performance regression in the free threading build. This reverts commit ab61d3f.
…odes.c` and the code generators (pythonGH-128918)" The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit ab61d3f.
It seems that #128918 caused a 2-3% slowdown on the free-threading build. |
FWIW, I couldn't reproduce this on our infrastructure. I'm not claiming one is better or more reliable, only that the effect being noticed here might be more complicated... |
* Remove all 'if (0)' and 'if (1)' conditional stack effects * Use array instead of conditional for BUILD_SLICE args * Refactor LOAD_GLOBAL to use a common conditional uop * Remove conditional stack effects from LOAD_ATTR specializations * Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array. * Remove conditional stack effects from CALL_FUNCTION_EX
Latest implementation has no adverse impact on performance. |
We should remove the conditional stack effects in instruction definitions in bytecodes.c
Conditional stack effects already complicate code generation and that is only going to get worse with top-of-stack caching and other interpreter/JIT optimizations.
There were two reasons for having conditional stack effects:
Reason 1 no longer applies. Instructions are much more regular now and it isn't that much work to remove the remaining conditional stack effects.
That leaves performance. I experimentally removed the conditional stack effects for
LOAD_GLOBAL
andLOAD_ATTR
which is the worse possible case for performance as it makes no attempt to mitigate the extra dispatch costs and possibly worse specialization.The results are here
Overall we see a 0.8% slowdown. It seems that specialization is not significantly worse, but there is a large increase in
PUSH_NULL
followingLOAD_GLOBAL
that appears to responsible for the slowdown. An extra specialization should fix that.Prior discussion
Linked PRs
bytecodes.c
and the code generators #128918bytecodes.c
and the code generators (GH-128918)" #129202The text was updated successfully, but these errors were encountered: