-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-128914: Remove conditional stack effects from bytecodes.c
and the code generators
#128918
GH-128914: Remove conditional stack effects from bytecodes.c
and the code generators
#128918
Conversation
This will conflict with #128722, so that PR should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice to see so much red.
Co-authored-by: Irit Katriel <[email protected]>
…to no-conditional-stack-effects
…and the code generators (pythonGH-128918)
Hi @markshannon, this PR introduced performance regressions in the free threading build:
I think that the multithreading scaling issue is because previously |
@colesbury IIRC I was the one that merged these two instructions together. Based on my memory at the time LOAD_ATTR does cover LOAD_METHOD in some cases, so I'm not surprised theres a perf regression. I'm surprised by your benchmark results though. If you look into them, it says things like nbody, spectralnorm slowed down. However, these dont use LOAD_ATTR at all? |
…odes.c` and the code generators (pythonGH-128918)" The commit introduced a large performance regression in the free threading build. This reverts commit ab61d3f.
…odes.c` and the code generators (pythonGH-128918)" The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit ab61d3f.
Not 100% sure, but this PR makes the test
|
OK, this fixes it!
|
This has been reverted for now: #129202 No need to push the fix. |
@colesbury wrote:
FWIW, I tried to reproduce this, and got the same 0.8% slowdown we got on a non-free-threaded build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250120-3.14.0a4+-d5e47ea-NOGIL To be clear, I'm not advocating one result over the other, but there's a high likelihood that something is different between these runners. It might be meaningful, it might not... |
This PR:
LOAD_ATTR
intoLOAD_ATTR
andLOAD_METHOD
. The specializations split neatly between the two, so no new specializations are needed.LOAD_SUPER_ATTR
intoLOAD_SUPER_ATTR
andLOAD_SUPER_METHOD
. This is a bit wasteful asLOAD_SUPER_ATTR
is quite rare and it needs an additional instrumented instruction as well. It might be worth trying to merge them somehow in another PR later on but doing so now would complicate this PR unnecessarily.Performance is 0.4% slower which is, I think, acceptable given the potential speedups from top of stack caching.
The slowdown appears to be mostly a result of the large number of extra
PUSH_NULL
instructions required. There are ways to mitigate this, but not in this PR.