Skip to content

Conversation

rafiw
Copy link
Contributor

@rafiw rafiw commented Aug 4, 2025

What?

fix memory issues

Why?

since it's needed

@rafiw rafiw requested a review from a team as a code owner August 4, 2025 08:04
Copy link

copy-pr-bot bot commented Aug 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

github-actions bot commented Aug 4, 2025

👋 Hi rafiw! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@@ -91,6 +91,8 @@ nixlAgentData::nixlAgentData(const std::string &name,
useEtcd = false;
NIXL_DEBUG << "NIXL ETCD is disabled";
}
#else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do

#if HAVE_ETCD
    if (getenv("NIXL_ETCD_ENDPOINTS")) {
        useEtcd = true;
        NIXL_DEBUG << "NIXL ETCD is enabled";
    } else
#endif
     {
        useEtcd = false;
        NIXL_DEBUG << "NIXL ETCD is disabled";
    }

@@ -404,6 +405,8 @@ nixlDocaEngine::nixlDocaInitNotif (const std::string &remote_agent,
delete notif->recv_mmap;
delete notif->recv_barr;

delete notif;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems aldo need to free notif->send_addr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also recv_addr

@rafiw rafiw force-pushed the valgrind_issues branch from 327ec55 to 5416d92 Compare August 4, 2025 11:21
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 4, 2025
@rafiw rafiw force-pushed the valgrind_issues branch 2 times, most recently from 5762f9f to 0e56dd8 Compare August 4, 2025 11:53
@mkhazraee mkhazraee requested a review from e-ago August 5, 2025 00:42
@@ -159,7 +159,6 @@ nixl_status_t
nixlObjEngine::deregisterMem(nixlBackendMD *meta) {
nixlObjMetadata *obj_md = static_cast<nixlObjMetadata *>(meta);
if (obj_md) {
std::unique_ptr<nixlObjMetadata> obj_md_ptr = std::unique_ptr<nixlObjMetadata>(obj_md);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this line? Who's deleting the metadata instance then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your right delete is needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, is there a problem with this code? Why are you substituting RAII-style memory management with inferior manual object deletion?

@@ -89,6 +90,7 @@ RUN echo "INFO: Starting custom UCX build..." && \
./autogen.sh && \
echo "INFO: Building UCX..." && \
./contrib/configure-release --with-cuda=/usr/local/cuda \
--enable-debug \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you need it but this is not the right way to do it. We want nixlbench container builds to happen in release mode by default, so that we can do performance testing. If we need debug mode, it should be a parameter.

I would rather add the changes in this file to #614 Would that be OK with you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was by mistake

@rafiw rafiw force-pushed the valgrind_issues branch from 159b8d1 to d27ba0d Compare August 5, 2025 13:08
Signed-off-by: Rafi Wiener <[email protected]>
@rafiw rafiw force-pushed the valgrind_issues branch 2 times, most recently from 8b222b3 to 90d9b33 Compare August 5, 2025 13:12
Signed-off-by: Rafi Wiener <[email protected]>
@rafiw rafiw force-pushed the valgrind_issues branch from 90d9b33 to 2d39c07 Compare August 5, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants