Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-128914: Remove conditional stack effects from bytecodes.c and the code generators #128918

Merged
merged 21 commits into from
Jan 20, 2025

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jan 16, 2025

This PR:

  • Removes support for conditional stack effects. Variable stack effects are still supported
  • Splits LOAD_ATTR into LOAD_ATTR and LOAD_METHOD. The specializations split neatly between the two, so no new specializations are needed.
  • Splits LOAD_SUPER_ATTR into LOAD_SUPER_ATTR and LOAD_SUPER_METHOD. This is a bit wasteful as LOAD_SUPER_ATTR is quite rare and it needs an additional instrumented instruction as well. It might be worth trying to merge them somehow in another PR later on but doing so now would complicate this PR unnecessarily.

Performance is 0.4% slower which is, I think, acceptable given the potential speedups from top of stack caching.

The slowdown appears to be mostly a result of the large number of extra PUSH_NULL instructions required. There are ways to mitigate this, but not in this PR.

@markshannon
Copy link
Member Author

This will conflict with #128722, so that PR should be merged first.

Copy link
Member

@iritkatriel iritkatriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice to see so much red.

@markshannon markshannon merged commit ab61d3f into python:main Jan 20, 2025
65 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 21, 2025
@markshannon markshannon deleted the no-conditional-stack-effects branch January 21, 2025 10:30
@colesbury
Copy link
Contributor

colesbury commented Jan 21, 2025

Hi @markshannon, this PR introduced performance regressions in the free threading build:

I think that the multithreading scaling issue is because previously module.foo() used to specialize to _LOAD_ATTR_MODULE_FROM_KEYS, but with this PR it now uses the unspecialized LOAD_METHOD. _LOAD_ATTR_MODULE_FROM_KEYS supports deferred reference counting, but LOAD_METHOD does not. That may be related to the single-threaded perf regression too, but I'm not sure.

@Fidget-Spinner
Copy link
Member

@colesbury IIRC I was the one that merged these two instructions together. Based on my memory at the time LOAD_ATTR does cover LOAD_METHOD in some cases, so I'm not surprised theres a perf regression.

I'm surprised by your benchmark results though. If you look into them, it says things like nbody, spectralnorm slowed down. However, these dont use LOAD_ATTR at all?

colesbury added a commit to colesbury/cpython that referenced this pull request Jan 22, 2025
…odes.c` and the code generators (pythonGH-128918)"

The commit introduced a large performance regression in the free threading build.

This reverts commit ab61d3f.
@markshannon markshannon restored the no-conditional-stack-effects branch January 22, 2025 17:40
colesbury added a commit to colesbury/cpython that referenced this pull request Jan 22, 2025
…odes.c` and the code generators (pythonGH-128918)"

The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f.
@diegorusso
Copy link
Contributor

diegorusso commented Jan 23, 2025

Not 100% sure, but this PR makes the test test.test__opcode failing if CPython is compiled with --enable-pystats

$ ./python -mtest test.test__opcode
Using random seed: 2376878257
0:00:00 load avg: 0.20 Run 1 test sequentially in a single process
0:00:00 load avg: 0.20 [1/1] test.test__opcode
test test.test__opcode failed -- Traceback (most recent call last):
  File "/home/dierus01/work/ce-sw/repos/cpython/Lib/test/test__opcode.py", line 131, in test_specialization_stats
    self.assertCountEqual(stats.keys(), specialized_opcodes)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Element counts were not equal:
First has 0, Second has 1:  'load_super_method'
First has 0, Second has 1:  'load_method'

test.test__opcode failed (1 failure)

== Tests result: FAILURE ==

1 test failed:
    test.test__opcode

Total duration: 24 ms
Total tests: run=7 failures=1
Total test files: run=1/1 failed=1
Result: FAILURE

@diegorusso
Copy link
Contributor

diegorusso commented Jan 23, 2025

OK, this fixes it!

$ git diff
diff --git a/Python/specialize.c b/Python/specialize.c
index eb599028cef..bc44b776026 100644
--- a/Python/specialize.c
+++ b/Python/specialize.c
@@ -111,6 +111,8 @@ _Py_GetSpecializationStats(void) {
     int err = 0;
     err += add_stat_dict(stats, CONTAINS_OP, "contains_op");
     err += add_stat_dict(stats, LOAD_SUPER_ATTR, "load_super_attr");
+    err += add_stat_dict(stats, LOAD_SUPER_METHOD, "load_super_method");
+    err += add_stat_dict(stats, LOAD_METHOD, "load_method");
     err += add_stat_dict(stats, LOAD_ATTR, "load_attr");
     err += add_stat_dict(stats, LOAD_GLOBAL, "load_global");
     err += add_stat_dict(stats, BINARY_SUBSCR, "binary_subscr");
$ ./python -mtest test.test__opcode
Using random seed: 911391223
0:00:00 load avg: 0.31 Run 1 test sequentially in a single process
0:00:00 load avg: 0.31 [1/1] test.test__opcode

== Tests result: SUCCESS ==

1 test OK.

Total duration: 19 ms
Total tests: run=7
Total test files: run=1/1
Result: SUCCESS

markshannon pushed a commit that referenced this pull request Jan 23, 2025
…` and the code generators (GH-128918)" (GH-129202)

The commit introduced a ~2.5-3% regression in the free threading build.

This reverts commit ab61d3f.
@diegorusso
Copy link
Contributor

diegorusso commented Jan 23, 2025

This has been reverted for now: #129202 No need to push the fix.

@mdboom
Copy link
Contributor

mdboom commented Jan 23, 2025

@colesbury wrote:

Hi @markshannon, this PR introduced performance regressions in the free threading build:

FWIW, I tried to reproduce this, and got the same 0.8% slowdown we got on a non-free-threaded build: https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250120-3.14.0a4+-d5e47ea-NOGIL

To be clear, I'm not advocating one result over the other, but there's a high likelihood that something is different between these runners. It might be meaningful, it might not...

@markshannon markshannon deleted the no-conditional-stack-effects branch January 31, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants