-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add fast_type_map, use it authoritatively for local types and as a hint for global types (ABI breaking) #5842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
cc0b053
1cd2b55
abf237f
b87e762
c333e85
8651580
3ca29e8
48f831b
08c09ae
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -221,10 +221,12 @@ extern "C" inline void pybind11_meta_dealloc(PyObject *obj) { | |
| auto tindex = std::type_index(*tinfo->cpptype); | ||
| internals.direct_conversions.erase(tindex); | ||
|
|
||
| auto &local_internals = get_local_internals(); | ||
| if (tinfo->module_local) { | ||
| get_local_internals().registered_types_cpp.erase(tindex); | ||
| local_internals.registered_types_cpp.erase(tinfo->cpptype); | ||
| } else { | ||
| internals.registered_types_cpp.erase(tindex); | ||
| local_internals.global_registered_types_cpp_fast.erase(tinfo->cpptype); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will only erase the fast map entry for the DSO where the type was bound. Other DSOs will preserve their cached entries in the fast map, which will now dangle. Solving that with an ABI break is pretty easy - make a linked list of all the fast maps and have modifications to the slow map invalidate the corresponding entries in the fast maps. (Or since you're taking an ABI break, just put the fast map in global internals.) Solving it without an ABI break will be difficult. The best way I can think of is to accompany each entry you add to the fast map with a keep_alive that will remove it when the targeted type object is being destroyed. That's pretty heavyweight though.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for finding the hole in this strategy! I wonder if there's a way to accommodate developing things in anticipation of a future ABI break without just doubling CI costs.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We used to have a few extra jobs that set Try We could do something surgical, like picking out one or a couple jobs each for Linux, macOS, Windows to override the internals version. |
||
| } | ||
| internals.registered_types_py.erase(tinfo->type); | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,7 +39,9 @@ | |
| /// further ABI-incompatible changes may be made before the ABI is officially | ||
| /// changed to the new version. | ||
| #ifndef PYBIND11_INTERNALS_VERSION | ||
| // REMINDER for next version bump: remove loader_life_support_tls | ||
| // REMINDER for next version bump: remove loader_life_support_tls, | ||
| // move local_internals::global_registered_types_cpp_fast to | ||
| // internals::registered_types_cpp_fast | ||
| # define PYBIND11_INTERNALS_VERSION 11 | ||
|
rwgk marked this conversation as resolved.
|
||
| #endif | ||
|
|
||
|
|
@@ -177,6 +179,12 @@ struct type_equal_to { | |
| }; | ||
| #endif | ||
|
|
||
| template <typename value_type> | ||
| // REVIEW: do we need to add a fancy hash for pointers or is the | ||
| // possible identity hash function from the standard library (e.g., | ||
| // libstdc++) sufficient? | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note that the Linux box I used for benchmarking has libstdc++.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to ack that I've seen this, but I don't know. It'd be good to rewrite the comment before merging if nobody knows. |
||
| using fast_type_map = std::unordered_map<const std::type_info *, value_type>; | ||
|
|
||
| template <typename value_type> | ||
| using type_map = std::unordered_map<std::type_index, value_type, type_hash, type_equal_to>; | ||
|
|
||
|
|
@@ -302,7 +310,15 @@ struct internals { | |
| // impact any other modules, because the only things accessing the local internals is the | ||
| // module that contains them. | ||
| struct local_internals { | ||
| type_map<type_info *> registered_types_cpp; | ||
| // It should be safe to use fast_type_map here because this entire | ||
| // data structure is scoped to our single module, and thus a single | ||
| // DSO and single instance of type_info for any particular type. | ||
| fast_type_map<type_info *> registered_types_cpp; | ||
|
|
||
| // fast hint for the *global* internals registered_types_cpp | ||
| // map. If we lookup successfully, that's the right answer; | ||
| // otherwise we go to the global map and then backfill this one. | ||
| fast_type_map<type_info *> global_registered_types_cpp_fast; | ||
|
swolchok marked this conversation as resolved.
Outdated
|
||
| std::forward_list<ExceptionTranslator> registered_exception_translators; | ||
| PyTypeObject *function_record_py_type = nullptr; | ||
| }; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -205,35 +205,57 @@ PYBIND11_NOINLINE detail::type_info *get_type_info(PyTypeObject *type) { | |
| return bases.front(); | ||
| } | ||
|
|
||
| inline detail::type_info *get_local_type_info(const std::type_index &tp) { | ||
| auto &locals = get_local_internals().registered_types_cpp; | ||
| auto it = locals.find(tp); | ||
| inline detail::type_info *get_local_type_info(const std::type_info &tp, | ||
| const local_internals &local_internals) { | ||
| const auto &locals = local_internals.registered_types_cpp; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not changed in this PR but I think this function needs a with_internals lock too. Multiple threads could run bindings from the same DSO simultaneously. local_internals doesn't have its own lock and is generally protected by the global internals lock. You could optimize by making the versions with a local_internals parameter assume internals is already locked. Then get_type_info() could query both internals under a single lock instead of acquiring, releasing, and re-acquiring.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems clearly out of scope for this PR.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (Also, doesn't the GIL preclude this sort of thing when present?)
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I figured since you were reworking these functions anyway, I would point out an easy-looking opportunity to improve their thread-safety while you're here. I agree the PR doesn't make current behavior worse and I wasn't trying enforce a wider scope.
Yes, but pybind11 supports free-threading mode.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
function_record_ptr_from_PyObject also accesses local_internals and isn't called under the global internals lock.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @swolchok Could you please help with general code health by creating an issue pointing to the two missing
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| auto it = locals.find(&tp); | ||
| if (it != locals.end()) { | ||
| return it->second; | ||
| } | ||
| return nullptr; | ||
| } | ||
|
|
||
| inline detail::type_info *get_global_type_info(const std::type_index &tp) { | ||
| inline detail::type_info *get_local_type_info(const std::type_info &tp) { | ||
| return get_local_type_info(tp, get_local_internals()); | ||
| } | ||
|
|
||
| inline detail::type_info *get_global_type_info(const std::type_info &tp, | ||
| local_internals &local_internals) { | ||
| return with_internals([&](internals &internals) { | ||
| detail::type_info *type_info = nullptr; | ||
| auto &fast_types = local_internals.global_registered_types_cpp_fast; | ||
| auto &types = internals.registered_types_cpp; | ||
| auto it = types.find(tp); | ||
| auto fast_it = fast_types.find(&tp); | ||
| if (fast_it != fast_types.end()) { | ||
| #ifndef NDEBUG | ||
| auto types_it = types.find(std::type_index(tp)); | ||
| assert(types_it != types.end()); | ||
| assert(types_it->second == fast_it->second); | ||
| #endif | ||
| return fast_it->second; | ||
| } | ||
| auto it = types.find(std::type_index(tp)); | ||
| if (it != types.end()) { | ||
| fast_types.emplace(&tp, it->second); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How are you going to cause this entry to be cleaned up when the type it points to is destroyed? Currently I think it will just dangle, which won't go well when you try to use the result. nanobind solves this problem by having its equivalent of
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I follow. Why does the fast map have different cleanup timing requirements than the slow map? We clean up both at the same time.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The slow map has one entry per C++ type. The fast map potentially has multiple. You’re cleaning up the fast map entry that was added by the module that defined the binding, but not the ones that may have been added by other DSOs (with different typeinfo pointers for that type) when they looked it up. I left this comment on the location in the code where those secondary entries are being added; I don’t see anywhere where they can be removed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Naive solution: when a type is deallocated, can't we just loop over the whole map and remove all entries with the same value as the one we want to remove? The nanobind solution seems fine, but the first thing that needs to be done is writing a test that will expose this problem, since we clearly don't have one (or it's not a problem for some reason).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not 100% clear on how to write a test that will result in classes being destroyed, but it looks like test_class does it by registering into temporary modules so I'll try that. I think we want something like:
theory is that currently we will see a crash; before this PR it should've worked?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your summary looks good to me, except that Python can't really unload a whole DSO. Instead, you can You can steal this function from the tests for #5800: After the type has been destroyed, you can bind it again with a new Be aware that extension types are immortal on non-CPython VMs (PyPy, GraalPy) and all types are immortal on free-threading CPython 3.13 (3.14+ have proper GC for them), so you'll want to disable the test for those platforms.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I will try to get the tests + fix up today.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sent #5867 |
||
| type_info = it->second; | ||
| } | ||
| return type_info; | ||
| }); | ||
| } | ||
|
|
||
| inline detail::type_info *get_global_type_info(const std::type_info &tp) { | ||
| return get_global_type_info(tp, get_local_internals()); | ||
| } | ||
|
|
||
| /// Return the type info for a given C++ type; on lookup failure can either throw or return | ||
| /// nullptr. | ||
| PYBIND11_NOINLINE detail::type_info *get_type_info(const std::type_index &tp, | ||
| PYBIND11_NOINLINE detail::type_info *get_type_info(const std::type_info &tp, | ||
| bool throw_if_missing = false) { | ||
| if (auto *ltype = get_local_type_info(tp)) { | ||
| auto &local_internals = get_local_internals(); | ||
| if (auto *ltype = get_local_type_info(tp, local_internals)) { | ||
| return ltype; | ||
| } | ||
| if (auto *gtype = get_global_type_info(tp)) { | ||
| if (auto *gtype = get_global_type_info(tp, local_internals)) { | ||
| return gtype; | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,11 +6,32 @@ namespace py = pybind11; | |
| * modules aren't preserved over a finalize/initialize. | ||
| */ | ||
|
|
||
| namespace { | ||
| // Compare unsafe_reset_internals_for_single_interpreter in | ||
| // test_subinterpreter.cpp. | ||
| void unsafe_reset_local_internals() { | ||
| // NOTE: This code is NOT SAFE unless the caller guarantees no other threads are alive | ||
| // NOTE: This code is tied to the precise implementation of the internals holder | ||
|
|
||
| // first, unref the thread local internals | ||
| py::detail::get_local_internals_pp_manager().unref(); | ||
|
|
||
| // now we unref the static global singleton internals | ||
| py::detail::get_local_internals_pp_manager().unref(); | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this intended to unref the non-local internals? looks redundant as written
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoops, butchered my copy/paste/modify job |
||
|
|
||
| // finally, we reload the static global singleton | ||
| py::detail::get_local_internals(); | ||
| } | ||
| } // namespace | ||
|
|
||
| PYBIND11_MODULE(external_module, | ||
| m, | ||
| py::mod_gil_not_used(), | ||
| py::multiple_interpreters::per_interpreter_gil()) { | ||
|
|
||
| // At least one test ("Single Subinterpreter") wants to reset | ||
| // internals. We have separate local internals because we are a | ||
| // separate DSO, so ours need to be reset too! | ||
| unsafe_reset_local_internals(); | ||
| class A { | ||
| public: | ||
| explicit A(int value) : v{value} {}; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the similar logic for registration, you're fetching the local internals even if you don't need them. Possibly a regression due to the history where the fast type map was in local internals even for non-local types?