Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change MSBuild to use the Clang backend #690

Open
Fidget-Spinner opened this issue Jul 2, 2024 · 9 comments
Open

Change MSBuild to use the Clang backend #690

Fidget-Spinner opened this issue Jul 2, 2024 · 9 comments

Comments

@Fidget-Spinner
Copy link
Collaborator

MSBuild has supported using clang backend for awhile now. https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170

MSVC has been somewhat unpredictable in comparison to the other compilers. We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2, of ceval.c being too big for MSVC to inline/optimize properly 3, and now, slowdowns that only hit it and no other platform 4.

We should seriously consider switching to another compiler, because it's starting to hinder our own productivity and performance goals on Windows.

@brandtbucher
Copy link
Member

@zooba

@zooba
Copy link

zooba commented Jul 2, 2024

Last time I ran benchmarks, clang wasn't any better than MSVC on Windows (the clang-cl support works because I updated it to run this benchmark).

It's also a really significant compatibility change, and there will certainly be users who want to keep building with MSVC. So I don't think we get to drop support entirely - it may only impact the python.org releases (possibly excluding the Store package, I'm not sure if clang support extends that far).

Personally, I doubt it's worth the effort. But if someone wants to make the effort to show that it's both faster and remains binary compatible1, then go for it.

Footnotes

  1. For stable ABI purposes only, of course.

@erozenfeld
Copy link

@Fidget-Spinner

I work on msvc compiler and will take a look at the issues you reported.

We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2

You have two identical links above, I suspect that's not what you intended?

@Fidget-Spinner
Copy link
Collaborator Author

@erozenfeld thank you for your attention. Sorry if I seem like I'm dunking on the MSVC compiler in the initial issue. If it seems overly negative, I'm sorry about that.

Yes the link is #321 instead. But out of all the links, the only actionable one I think is python/cpython#121263. We've had a problem where ceval.c's _PyEval_EvalFrameDefault is so big that MSVC refuses to inline small functions anymore, making us use macros all over.

Thank you for your time!

@erozenfeld
Copy link

erozenfeld commented Jul 17, 2024

@Fidget-Spinner Thank you for the clarification. I spent some time looking at python/cpython#121263.

I found one compiler issue you are running into. Normally we compile functions in a bottom-up call graph order, i.e. callees before callers. That way we have more information for optimizations, including inlining. However, with multi-threaded compilation we schedule huge functions for earlier compilation in order to improve the overall multithreaded compilation time. The downside is that these huge functions don't always have information about callees, which results in worse code generation (and worse inlining decisions) for these huge functions. We are considering disabling that scheduling tweak for PGO (or perhaps for huge functions that are hot according to PGO info). In the meantime we have a switch that disables that: /d2:-noadditionalscheduling. It should be passed to link.exe. I reverted the macro-ization in python/cpython@722229e and checked what happens with /d2:-noadditionalscheduling . Of the functions that you turned into macros PyStackRef_Is is now inlined in all 17 call sites. Note that PyStackRef_AsPyObjectBorrow, PyStackRef_TYPE, and PyStackRef_FromPyObjectImmortal are inlined even without /d2:-noadditionalscheduling.
Unfortunately, PyStackRef_AsPyObjectSteal, PyStackRef_AsPyObjectNew, PyStackRef_FromPyObjectSteal, PyStackRef_FromPyObjectNew, PyStackRef_CLOSE and PyStackRef_DUP are over the limit when the caller is huge and are not inlined. I'll keep looking at whether inlining heuristics can be tweaked but I suggest you should add /d2:-noadditionalscheduling since it enables a bunch of inlines outside of the set that was turned into macros in python/cpython@722229e . For example, calls to the following functions are now inlined: _Py_EnsureFuncTstateNotNULL, Py_IS_TYPE, _Py_DECREF_SPECIALIZED, _Py_DECREF_NO_DEALLOC, _PyLong_IsNegative, _PyLong_IsCompact, _PyLong_IsNonNegativeCompact, _PyLong_IsZero, PyFloat_FromDouble, PyType_HasFeature, PyType_Check, _PyFunction_SetVersion, and _PyObject_GC_IS_TRACKED. I'm curious if that will improve your benchmark results.

@erozenfeld
Copy link

One more note: the compilation order scheduling tweak mentioned above affects not only _PyEval_EvalFrameDefault but 158 other functions considered "big" so their codegen may be negatively affected. So adding /d2:-noadditionalscheduling should help with codegen of those functions too.

@mdboom
Copy link
Contributor

mdboom commented Jul 18, 2024

I'll run a build with /d2:-noadditionalscheduling over our benchmark suite and report back. (May take a couple of days, I just got back from vacation to our Windows benchmarking machine behaving strangely, so I'll need to fix that first).

@mdboom
Copy link
Contributor

mdboom commented Jul 19, 2024

The /d2:-noadditionalscheduling flag helps, but not as much as the macro-ifying changes in python/cpython#121270.

I took the parent of python/cpython#121270, 93156880efd14ad7adc7d3512552b434f5543890, which still used static inline, and applied the /d2:-noadditionalscheduling flag. This resulted in a 2% speedup on 64-bit Windows, vs. a 5% speedup by converting all of those new functions to macros.

To confirm this, I am going to run 2 more experiments over the weekend:

  • The effect of applying the flag to CPython main
  • The effect of reverting #121790 against current CPython main and applying the flag

@mdboom
Copy link
Contributor

mdboom commented Jul 19, 2024

The results are in:

(All of this is with PGO, as is all of our benchmarking).

So, it's clear that the flag helps (2% wins are hard to come by), and perhaps we should always use it for PGO builds (where the slower build times are already expected). But it's also clear that the macros (forcible inlining) are still helpful.

Thanks again for all your help on this @erozenfeld. I'd be happy to help benchmark any ways to tweak the inlining heuristics that you think might be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants