Skip to content

Conversation

@alexanderbilk
Copy link

Add capability to add new plug-ins types

Add capability to add new plug-ins types

Signed-off-by: Aleksandr Bilkovskii <[email protected]>
@alexanderbilk alexanderbilk requested a review from a team as a code owner December 1, 2025 19:23
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

github-actions bot commented Dec 1, 2025

👋 Hi alexanderbilk! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@ovidiusm
Copy link
Contributor

ovidiusm commented Dec 1, 2025

/build

* @param plugin_name Name of the plugin to load
* @return Typed plugin handle or nullptr if not found or cast fails
*/
template<typename handleType = basePluginHandle>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to template all these methods? Is there no other way?

* Derived classes override to provide specific version checking logic
*/
virtual bool
checkApiVersion(void *plugin_interface) const = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use void * instead of proper types?

struct pluginConfig {
std::string initFuncName; // e.g., "nixl_plugin_init"
std::string finiFuncName; // e.g., "nixl_plugin_fini"
std::string filenamePrefix; // e.g., "libplugin_"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this configurable?

std::string initFuncName; // e.g., "nixl_plugin_init"
std::string finiFuncName; // e.g., "nixl_plugin_fini"
std::string filenamePrefix; // e.g., "libplugin_"
std::string filenameSuffix; // e.g., ".so"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is ".so" configurable?

std::string finiFuncName; // e.g., "nixl_plugin_fini"
std::string filenamePrefix; // e.g., "libplugin_"
std::string filenameSuffix; // e.g., ".so"
int expectedApiVersion; // Expected API version for validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the API of the backend plugins, I don't think it should belong here

int expectedApiVersion; // Expected API version for validation
};

template<typename pluginInterface> struct success_result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall code comment.

This implementation has fundamental architectural issues.

It tries to use three distinct and conflicting C++ paradigms simultaneously:

  • C-style code: void* + static casts
  • Classic OOP: Inheritance (base vs derived), virtual functions.
  • Template metaprogramming

This is a red flag, since a subset of these (possibly even just one) would be sufficient to implement the functionality.

The typing is strange. There appears to be type-safety by offering templated methods like loadPlugin, but implementation-wise, it is completely unsafe. The internal logic immediately type-erases the pointer to void*, passes it around effectively untyped, and then performs a hard static_cast or reinterpret_cast later.

This is much worse than the code we currently have, which is using proper typing for the object returned by init().

The ownership model is confusing.

The base class thinks it owns the plugin handles (loadedPlugins_ map).
The templates think the caller defines the plugin type ().
The derived class thinks it defines the validation logic (checkApiVersion) but it is shared with the base class.

I find this code very hard to read and follow.

I think the amount of work needed to fix this is much higher than implementing a dispatcher:

Extend the existing nixlPluginManager to hold two maps (loaded_backend_p_ and loaded_telemetry_p_) and use symbol probing (dlsym) to identify which map a loaded library belongs to.

void *plugin_interface) override;

bool
canUnloadPlugin(const std::string &plugin_name) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using std::string_view when API is just read-only, without storing param

bool
checkApiVersion(void *plugin_interface) const override;

std::shared_ptr<basePluginHandle>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why shared pointer?
Typically unique_ptr is sufficient for factory method

info.name = name;
info.createFunc = creator;
static_plugins_.push_back(info);
staticPlugins_.push_back(info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can use emplace_back
  2. keeping const char * inside info will not work for dynamic strings. Maybe keep as std::string?

virtual const char *
getVersion() const = 0;

const void *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opaque type maybe not be the best option in C++


protected:
mutable std::mutex lock_;
std::map<std::string, std::shared_ptr<basePluginHandle>, std::less<>> loadedPlugins_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe unique_ptr?
std::less is redundant? std::map is already an ordered collection

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::less<> is not redundant, the default is std::less< std::string > which does not allow searching the map with a different, but compatible, type, e.g. with a std::string_view.

const void *plugin_interface)
: dlHandle_(std::move(handle)),
pluginInterface_(plugin_interface) {
assert(dlHandle_ && "DlHandleDeleter must not be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIXL_ASSERT maybe?


basePluginManager::basePluginManager(pluginConfig config) : config_(std::move(config)) {}

plugin_load_result<void>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using failure result - just throw an exception, this is not pref critical path
This will greatly simplify the design, not need to use variant

* limitations under the License.
*/

#ifndef __BASE_PLUGIN_MANAGER_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest NIXL_SRC_CORE_BASE_PLUGIN_MANAGER_H.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants