Skip to content

Implement binary version of make_index_sequence #5751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 24, 2025
Merged

Conversation

rmjarvis
Copy link
Contributor

@rmjarvis rmjarvis commented Jul 12, 2025

Description

TL;DR: This replaces pybind11’s hand-rolled make_index_sequence with a more efficient O(log N) version. The old version had O(N) template recursion depth, which could trigger compiler limits or slow down builds—especially in code like descr, io_name, and other parts of pybind11 that use make_index_sequence to build long signature strings at compile time.


I started getting problems with template-recursion-depth starting with pybind version 3.0. It seems that something caused the N values in pybind11::detail::descr<N> to get significantly larger. doubles went from 7 to 30. And pybind11:array_t went from 30 to 81. For one of my initializers with lots of parameters (including a bunch of numpy arrays), this led to a call of pybind11::detail::make_index_sequence_impl<2251>, which required a very high -ftemplate-recursion-depth value when compiling. (At least 2260, due to a few extra levels beyond the 2251 for this class.)

Even once I made that high enough, I still got warnings of stack nearly exhausted; compilation time may suffer, and crashes due to stack overflow are likely, which is presumably why one of my CI runners crashed trying to compile my code, when it used to work just fine with pybind11 version 2.13.

The underlying problem is that the implementation of make_index_sequence is linear in N, so it requires making N template from 0 to N. With N=2251, this exhausted the default allowed template depth and apparently used up most of the stack.

This PR switches the implementation to a binary version whose complexity is logarithmic, so it only requires a depth of log(N). For my use case mentioned above, it worked fine with a template depth of only 60, which is much lower than the default.

Suggested changelog entry:

  • Implement binary version of make_index_sequence to reduce template depth requirements for functions with many parameters.

📚 Documentation preview 📚: https://pybind11--5751.org.readthedocs.build/

@rwgk
Copy link
Collaborator

rwgk commented Jul 24, 2025

Sorry for taking so long to look here.

This looks great to me, and I think it's one of the few cases where it's fine not to add a new test.

It'd be ideal to understand what made this necessary.

Could you please try if your real-world use case

Note that PR #5486 introduced this:

+// Use a different name based on whether the parameter is used as input or output
+template <size_t N1, size_t N2>
+constexpr descr<N1 + N2 + 1> io_name(char const (&text1)[N1], char const (&text2)[N2]) {
+    return const_name("@") + const_name(text1) + const_name("@") + const_name(text2)
+           + const_name("@");
+}
+

Therefore it seems very likely to me that that's it, but it'd be great to have a solid confirmation, so we're sure we don't have to look elsewhere.

Only if that's not it:

If that's still not it, my final guess:

?

@timohl @InvincibleRMC @henryiii for visibility

@rwgk
Copy link
Collaborator

rwgk commented Jul 24, 2025

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a TL;DR to the PR description (generated by ChatGPT).

@rwgk rwgk merged commit 49d19fe into pybind:master Jul 24, 2025
82 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Jul 24, 2025
@rmjarvis
Copy link
Contributor Author

rmjarvis commented Jul 24, 2025

Thanks Ralf.

I used git bisect to find the first commit that fails for my original use case, and it wasn't one that you suggested. It's 2415242. Looks like something to do with the numpy array treatment, so I guess the py::array_t name was affected here.

If you're curious, the function signature that caused the most problems is

template <int D1, int D2, int D3>
Corr3<D1,D2,D3>* BuildCorr3(
    BinType bin_type, double minsep, double maxsep, int nbins, double binsize, double b, double a,
    double minu, double maxu, int nubins, double ubinsize, double bu,
    double minv, double maxv, int nvbins, double vbinsize, double bv,
    double minrpar, double maxrpar, double xp, double yp, double zp,
    py::array_t<double>& zeta0p, py::array_t<double>& zeta1p,
    py::array_t<double>& zeta2p, py::array_t<double>& zeta3p,
    py::array_t<double>& zeta4p, py::array_t<double>& zeta5p,
    py::array_t<double>& zeta6p, py::array_t<double>& zeta7p,
    py::array_t<double>& meand1p, py::array_t<double>& meanlogd1p,
    py::array_t<double>& meand2p, py::array_t<double>& meanlogd2p,
    py::array_t<double>& meand3p, py::array_t<double>& meanlogd3p,
    py::array_t<double>& meanup, py::array_t<double>& meanvp,
    py::array_t<double>& weightp, py::array_t<double>& weightip,
    py::array_t<double>& ntrip)

As I said, lots of numpy arrays. (BinType is an enum, so basically int.)

I originally thought about refactoring this function to make e.g. the 8 zeta arrays a single numpy array in python and then split them into separate views in C++. But I ended up deciding that it was simpler to just fix the underlying problem in pybind11.

Also, the ChatGPT and Claude reviews were quite amusing. Thanks for sharing them. They each got something wrong though.

ChatGPT said that "further improvements (like binary search for next_power_of_2) ..." In fact next_power_of_2 is already a binary search. It's already log(N) complexity.

Claude's worked example for N=5 is wrong. It's obviously wrong in the last step, but the root error is in the first step. The first step doubles an empty set and adds one to get 0, not 0,1 as Claude claimed. Then, double to get 0,1, and finally double and add one to get 0,1,2,3,4. It does the normal LLM thing of knowing the right answer, so it shoehorns its logic to get there rather than double check itself to find the earlier mistake.

@timohl
Copy link
Contributor

timohl commented Jul 24, 2025

Note that PR #5486 introduced this:

+// Use a different name based on whether the parameter is used as input or output
+template <size_t N1, size_t N2>
+constexpr descr<N1 + N2 + 1> io_name(char const (&text1)[N1], char const (&text2)[N2]) {
+    return const_name("@") + const_name(text1) + const_name("@") + const_name(text2)
+           + const_name("@");
+}
+

PR #5486 introduces the io_name feature, but only adds it to stl filesystem.

I used git bisect to find the first commit that fails for my original use case, and it wasn't one that you suggested. It's 2415242.

PR #5212 is the follow up that applies io_name to numpy types, so it makes sense that git bisect found 2415242 if io_name is the problem.

Great to see the new more efficient implementation of make_index_sequence. :)

@rwgk
Copy link
Collaborator

rwgk commented Jul 24, 2025

I used git bisect to find the first commit that fails for my original use case, and it wasn't one that you suggested. It's 2415242. Looks like something to do with the numpy array treatment, so I guess the py::array_t name was affected here.

Thanks a lot, it's great to have certainty!

Also, the ChatGPT and Claude reviews were quite amusing. Thanks for sharing them. They each got something wrong though.

Yeah, these tools are often amazing in both directions, good or bad. Amusing is a good way to put it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs changelog Possibly needs a changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants