Change MSBuild to use the Clang backend #690

Fidget-Spinner · 2024-07-02T14:03:23Z

MSBuild has supported using clang backend for awhile now. https://learn.microsoft.com/en-us/cpp/build/clang-support-msbuild?view=msvc-170

MSVC has been somewhat unpredictable in comparison to the other compilers. We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2, of ceval.c being too big for MSVC to inline/optimize properly 3, and now, slowdowns that only hit it and no other platform 4.

We should seriously consider switching to another compiler, because it's starting to hinder our own productivity and performance goals on Windows.

brandtbucher · 2024-07-02T14:42:23Z

@zooba

zooba · 2024-07-02T16:41:29Z

Last time I ran benchmarks, clang wasn't any better than MSVC on Windows (the clang-cl support works because I updated it to run this benchmark).

It's also a really significant compatibility change, and there will certainly be users who want to keep building with MSVC. So I don't think we get to drop support entirely - it may only impact the python.org releases (possibly excluding the Store package, I'm not sure if clang support extends that far).

Personally, I doubt it's worth the effort. But if someone wants to make the effort to show that it's both faster and remains binary compatible¹, then go for it.

For stable ABI purposes only, of course. ↩

erozenfeld · 2024-07-08T23:16:59Z

@Fidget-Spinner

I work on msvc compiler and will take a look at the issues you reported.

We've had reports that it does not benefit from Python 3.11 1, of random slowdowns 2

You have two identical links above, I suspect that's not what you intended?

Fidget-Spinner · 2024-07-09T02:39:47Z

@erozenfeld thank you for your attention. Sorry if I seem like I'm dunking on the MSVC compiler in the initial issue. If it seems overly negative, I'm sorry about that.

Yes the link is #321 instead. But out of all the links, the only actionable one I think is python/cpython#121263. We've had a problem where ceval.c's _PyEval_EvalFrameDefault is so big that MSVC refuses to inline small functions anymore, making us use macros all over.

Thank you for your time!

erozenfeld · 2024-07-17T03:10:19Z

@Fidget-Spinner Thank you for the clarification. I spent some time looking at python/cpython#121263.

I found one compiler issue you are running into. Normally we compile functions in a bottom-up call graph order, i.e. callees before callers. That way we have more information for optimizations, including inlining. However, with multi-threaded compilation we schedule huge functions for earlier compilation in order to improve the overall multithreaded compilation time. The downside is that these huge functions don't always have information about callees, which results in worse code generation (and worse inlining decisions) for these huge functions. We are considering disabling that scheduling tweak for PGO (or perhaps for huge functions that are hot according to PGO info). In the meantime we have a switch that disables that: /d2:-noadditionalscheduling. It should be passed to link.exe. I reverted the macro-ization in python/cpython@722229e and checked what happens with /d2:-noadditionalscheduling . Of the functions that you turned into macros PyStackRef_Is is now inlined in all 17 call sites. Note that PyStackRef_AsPyObjectBorrow, PyStackRef_TYPE, and PyStackRef_FromPyObjectImmortal are inlined even without /d2:-noadditionalscheduling.
Unfortunately, PyStackRef_AsPyObjectSteal, PyStackRef_AsPyObjectNew, PyStackRef_FromPyObjectSteal, PyStackRef_FromPyObjectNew, PyStackRef_CLOSE and PyStackRef_DUP are over the limit when the caller is huge and are not inlined. I'll keep looking at whether inlining heuristics can be tweaked but I suggest you should add /d2:-noadditionalscheduling since it enables a bunch of inlines outside of the set that was turned into macros in python/cpython@722229e . For example, calls to the following functions are now inlined: _Py_EnsureFuncTstateNotNULL, Py_IS_TYPE, _Py_DECREF_SPECIALIZED, _Py_DECREF_NO_DEALLOC, _PyLong_IsNegative, _PyLong_IsCompact, _PyLong_IsNonNegativeCompact, _PyLong_IsZero, PyFloat_FromDouble, PyType_HasFeature, PyType_Check, _PyFunction_SetVersion, and _PyObject_GC_IS_TRACKED. I'm curious if that will improve your benchmark results.

erozenfeld · 2024-07-17T23:01:04Z

One more note: the compilation order scheduling tweak mentioned above affects not only _PyEval_EvalFrameDefault but 158 other functions considered "big" so their codegen may be negatively affected. So adding /d2:-noadditionalscheduling should help with codegen of those functions too.

mdboom · 2024-07-18T19:35:34Z

I'll run a build with /d2:-noadditionalscheduling over our benchmark suite and report back. (May take a couple of days, I just got back from vacation to our Windows benchmarking machine behaving strangely, so I'll need to fix that first).

mdboom · 2024-07-19T15:55:09Z

The /d2:-noadditionalscheduling flag helps, but not as much as the macro-ifying changes in python/cpython#121270.

I took the parent of python/cpython#121270, 93156880efd14ad7adc7d3512552b434f5543890, which still used static inline, and applied the /d2:-noadditionalscheduling flag. This resulted in a 2% speedup on 64-bit Windows, vs. a 5% speedup by converting all of those new functions to macros.

To confirm this, I am going to run 2 more experiments over the weekend:

The effect of applying the flag to CPython main
The effect of reverting #121790 against current CPython main and applying the flag

mdboom · 2024-07-19T20:12:53Z

The results are in:

Applying /d2:-noadditionalscheduling directly to main (python/cpython@d66b06) results in a 2% speedup overall
Reverting gh-121263: Macro-ify most stackref functions for MSVC python/cpython#121270 (the macro-ization of some inline functions by @Fidget-Spinner) and then applying the flag results in a 3% slowdown overall

(All of this is with PGO, as is all of our benchmarking).

So, it's clear that the flag helps (2% wins are hard to come by), and perhaps we should always use it for PGO builds (where the slower build times are already expected). But it's also clear that the macros (forcible inlining) are still helpful.

Thanks again for all your help on this @erozenfeld. I'd be happy to help benchmark any ways to tweak the inlining heuristics that you think might be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change MSBuild to use the Clang backend #690

Change MSBuild to use the Clang backend #690

Fidget-Spinner commented Jul 2, 2024

brandtbucher commented Jul 2, 2024

zooba commented Jul 2, 2024

erozenfeld commented Jul 8, 2024

Fidget-Spinner commented Jul 9, 2024

erozenfeld commented Jul 17, 2024 •

edited

Loading

erozenfeld commented Jul 17, 2024

mdboom commented Jul 18, 2024

mdboom commented Jul 19, 2024 •

edited

Loading

mdboom commented Jul 19, 2024

Change MSBuild to use the Clang backend #690

Change MSBuild to use the Clang backend #690

Comments

Fidget-Spinner commented Jul 2, 2024

brandtbucher commented Jul 2, 2024

zooba commented Jul 2, 2024

Footnotes

erozenfeld commented Jul 8, 2024

Fidget-Spinner commented Jul 9, 2024

erozenfeld commented Jul 17, 2024 • edited Loading

erozenfeld commented Jul 17, 2024

mdboom commented Jul 18, 2024

mdboom commented Jul 19, 2024 • edited Loading

mdboom commented Jul 19, 2024

erozenfeld commented Jul 17, 2024 •

edited

Loading

mdboom commented Jul 19, 2024 •

edited

Loading