-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Implement binary version of make_index_sequence #5751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry for taking so long to look here. This looks great to me, and I think it's one of the few cases where it's fine not to add a new test. It'd be ideal to understand what made this necessary. Could you please try if your real-world use case
Note that PR #5486 introduced this: +// Use a different name based on whether the parameter is used as input or output
+template <size_t N1, size_t N2>
+constexpr descr<N1 + N2 + 1> io_name(char const (&text1)[N1], char const (&text2)[N2]) {
+ return const_name("@") + const_name(text1) + const_name("@") + const_name(text2)
+ + const_name("@");
+}
+ Therefore it seems very likely to me that that's it, but it'd be great to have a solid confirmation, so we're sure we don't have to look elsewhere. Only if that's not it:
If that's still not it, my final guess:
? @timohl @InvincibleRMC @henryiii for visibility |
ChatGPT and claude.ai are very happy with this PR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a TL;DR to the PR description (generated by ChatGPT).
Thanks Ralf. I used git bisect to find the first commit that fails for my original use case, and it wasn't one that you suggested. It's 2415242. Looks like something to do with the numpy array treatment, so I guess the py::array_t name was affected here. If you're curious, the function signature that caused the most problems is
As I said, lots of numpy arrays. (BinType is an enum, so basically I originally thought about refactoring this function to make e.g. the 8 zeta arrays a single numpy array in python and then split them into separate views in C++. But I ended up deciding that it was simpler to just fix the underlying problem in pybind11. Also, the ChatGPT and Claude reviews were quite amusing. Thanks for sharing them. They each got something wrong though. ChatGPT said that "further improvements (like binary search for next_power_of_2) ..." In fact Claude's worked example for N=5 is wrong. It's obviously wrong in the last step, but the root error is in the first step. The first step doubles an empty set and adds one to get |
PR #5486 introduces the
PR #5212 is the follow up that applies Great to see the new more efficient implementation of |
Thanks a lot, it's great to have certainty!
Yeah, these tools are often amazing in both directions, good or bad. Amusing is a good way to put it! |
Description
TL;DR: This replaces pybind11’s hand-rolled
make_index_sequence
with a more efficientO(log N)
version. The old version hadO(N)
template recursion depth, which could trigger compiler limits or slow down builds—especially in code likedescr
,io_name
, and other parts of pybind11 that usemake_index_sequence
to build long signature strings at compile time.I started getting problems with template-recursion-depth starting with pybind version 3.0. It seems that something caused the N values in
pybind11::detail::descr<N>
to get significantly larger. doubles went from 7 to 30. And pybind11:array_t went from 30 to 81. For one of my initializers with lots of parameters (including a bunch of numpy arrays), this led to a call ofpybind11::detail::make_index_sequence_impl<2251>
, which required a very high-ftemplate-recursion-depth
value when compiling. (At least 2260, due to a few extra levels beyond the 2251 for this class.)Even once I made that high enough, I still got warnings of
stack nearly exhausted; compilation time may suffer, and crashes due to stack overflow are likely
, which is presumably why one of my CI runners crashed trying to compile my code, when it used to work just fine with pybind11 version 2.13.The underlying problem is that the implementation of
make_index_sequence
is linear in N, so it requires making N template from 0 to N. With N=2251, this exhausted the default allowed template depth and apparently used up most of the stack.This PR switches the implementation to a binary version whose complexity is logarithmic, so it only requires a depth of
log(N)
. For my use case mentioned above, it worked fine with a template depth of only 60, which is much lower than the default.Suggested changelog entry:
📚 Documentation preview 📚: https://pybind11--5751.org.readthedocs.build/