-
Notifications
You must be signed in to change notification settings - Fork 40
Add the feature of automatically generating candidates #274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @cq-eng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the flexibility and extensibility of the tuning mechanism by introducing an automated candidate generation feature. Instead of relying on hardcoded configurations, the system can now dynamically create a comprehensive list of tuning candidates based on a set of predefined environment variables and their potential values. This change streamlines the process of exploring different performance configurations, particularly for NCCL-related parameters, and lays the groundwork for more sophisticated auto-tuning capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a feature for automatically generating candidate configurations for the flagcx tuner, specifically for NVIDIA adaptors. This involves creating a new EnvVar struct to represent environment variables with configurable choices and default values, and a generateCandidate function to generate combinations of these variables. The changes also replace the hardcoded configurations with dynamically generated ones during tuner initialization. I have identified a potential issue with the truncation of environment variables and suggest a modification to improve the handling of string copies.
flagcx/adaptor/tuner/tuner_util.cc
Outdated
| // If the number of variables exceeds the structure capacity, truncate | ||
| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| return flagcxInvalidArgument; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code checks if the number of variables exceeds FLAGCX_ENV_LIST_MAX_LENGTH and returns flagcxInvalidArgument. However, the variables are not actually truncated. It would be more appropriate to truncate the vars vector to FLAGCX_ENV_LIST_MAX_LENGTH to prevent exceeding the bounds of the envs array in the flagcxEnvConfig struct. This would also align the behavior with the comment's intention.
if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
return flagcxInvalidArgument; // Or continue after truncation, depending on desired behavior
} if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
return flagcxInvalidArgument; // Or continue after truncation, depending on desired behavior
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just truncate vars to FLAGCX_ENV_LIST_MAX_LENGTH and return flagcxSuccess?
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) { | ||
| if (dstSize == 0) return; | ||
| size_t copyLen = std::min(dstSize - 1, src.size()); | ||
| if (copyLen > 0) memcpy(dst, src.data(), copyLen); | ||
| dst[copyLen] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The safeStrCopy function truncates the source string if it's larger than the destination buffer. However, it doesn't provide any feedback to the user about this truncation. Consider adding a log message to inform the user when truncation occurs, as this could lead to unexpected behavior if the truncated part is significant.
static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) {
if (dstSize == 0) return;
size_t copyLen = std::min(dstSize - 1, src.size());
if (copyLen > 0) memcpy(dst, src.data(), copyLen);
dst[copyLen] = '\0';
if (copyLen < src.size()) {
INFO(FLAGCX_INIT, "String truncated during safeStrCopy");
}
}static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) {
if (dstSize == 0) return;
size_t copyLen = std::min(dstSize - 1, src.size());
if (copyLen > 0) memcpy(dst, src.data(), copyLen);
dst[copyLen] = '\0';
if (copyLen < src.size()) {
INFO(FLAGCX_INIT, "String truncated during safeStrCopy");
}
}| DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event), | ||
| cudaEventDisableTiming)); | ||
| cudaEventDefault)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cudaEventDefault might not be the most appropriate flag, as it allows timing. If timing is not required, it's better to use cudaEventDisableTiming for potentially better performance. If timing is required, then this is fine.
DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
cudaEventDisableTiming)); DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
cudaEventDisableTiming));|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature for automatically generating candidate configurations for the flagcx tuner, primarily targeting NVIDIA GPUs. It includes changes to the CUDA adaptor, NCCL parameter handling, and tuner utility functions. The changes involve creating a new generateCandidate function to produce configuration combinations based on predefined environment variables, and integrating this function into the flagcx tuner initialization process. Additionally, the code includes safety measures for string copying and variable list truncation.
| if (vars.empty()){ | ||
| INFO(FLAGCX_INIT, "Invalid number of environment variables: 0"); | ||
| return flagcxInvalidArgument; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message indicates an invalid number of environment variables, but the function returns flagcxInvalidArgument. It would be more appropriate to return flagcxSuccess since the function is explicitly handling the case where no environment variables are present, and the framework should continue without erroring out. Returning flagcxInvalidArgument might halt the execution of the framework, which is not the desired behavior.
INFO(FLAGCX_INIT, "Invalid number of environment variables: 0");
return flagcxSuccess;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to return flagcxSuccess and do nothing.
flagcx/adaptor/tuner/tuner_util.cc
Outdated
| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| return flagcxInvalidArgument; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, returning flagcxInvalidArgument when the number of variables exceeds the maximum length might be too strict. Consider truncating the variable list and continuing with a warning, or returning flagcxSuccess to allow the framework to proceed with the truncated list. Halting execution due to exceeding the maximum length might not be the desired behavior.
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
return flagcxSuccess;| void **context) { | ||
| struct flagcxTunerContext *ctx = new struct flagcxTunerContext; | ||
| FLAGCXCHECK(loadConfigList(ctx->configList)); | ||
| FLAGCXCHECK(generateCandidate(ctx->configList)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for the generateCandidate function call. If generateCandidate fails, the tuner initialization should also fail gracefully, preventing potential issues later on. Add a check to ensure that generateCandidate returns flagcxSuccess before proceeding.
| FLAGCXCHECK(generateCandidate(ctx->configList)); | |
| flagcxResult_t result = generateCandidate(ctx->configList); | |
| if (result != flagcxSuccess) { | |
| INFO(FLAGCX_TUNING, "Candidate generation failed."); | |
| delete ctx; | |
| return result; | |
| } |
| flagcxCalloc(event, 1); | ||
| DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event), | ||
| cudaEventDisableTiming)); | ||
| cudaEventDefault)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from cudaEventDisableTiming to cudaEventDefault might affect the accuracy of timing measurements. Verify that this change doesn't negatively impact the tuner's ability to select optimal configurations. If timing is still needed, consider using cudaEventBlockingSync or cudaEventDefault with appropriate flags.
If timing is not needed, then this change is fine.
| while (!done) { | ||
| // Construct a flagcxEnvConfig and zero-initialize | ||
| flagcxEnvConfig cfg; | ||
| memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using memset to zero-initialize the flagcxEnvConfig structure is good practice. However, it's crucial to ensure that the commTag field within flagcxEnvConfig is also properly initialized, especially the tag array. If the tag array is not null-terminated, it could lead to issues when the tag is used later on. Consider explicitly null-terminating the tag array after the memset operation.
flagcxEnvConfig cfg;
memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults
cfg.commTag.tag[0] = '\0'; // Ensure null termination… And fix the risk of link failure in non-NVIDIA_ADAPTOR.
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature for automatically generating candidate configurations for the flagcx tuner, leveraging environment variables to explore different configuration options. The changes include modifications to the CUDA adaptor, NCCL parameter handling, tuner utilities, and core flagcx tuner components. The goal is to enhance the tuning process by systematically generating and evaluating various configuration candidates.
flagcx/adaptor/tuner/tuner_util.cc
Outdated
| std::string tagStr = "Config " + std::to_string(numCandidate); | ||
| safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag string "Config " + std::to_string(numCandidate) might become too long and cause a buffer overflow when copied to cfg.commTag.tag. Ensure that tagStr.size() is always less than sizeof(cfg.commTag.tag) before calling safeStrCopy. Consider adding a check to prevent potential buffer overflows.
std::string tagStr = "Config " + std::to_string(numCandidate);
if (tagStr.size() < sizeof(cfg.commTag.tag)) {
safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);
} else {
INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow");
// Handle the error appropriately, e.g., truncate the string or skip this candidate
}| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truncating vars to FLAGCX_ENV_LIST_MAX_LENGTH and returning flagcxSuccess after logging might mask potential issues. It would be better to return an error code (e.g., flagcxResourceExhausted) to indicate that some variables were truncated, allowing the caller to handle this situation appropriately. This provides more transparency and control to the user.
if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH);
return flagcxResourceExhausted; // Indicate resource exhaustion
}
flagcx/adaptor/tuner/tuner_util.h
Outdated
| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| std::vector<struct flagcxEnvConfig> generateCandidate(std::vector<EnvVar> vars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| flagcxResult_t cudaAdaptorEventCreate(flagcxEvent_t *event) { | ||
| (*event) = NULL; | ||
| flagcxCalloc(event, 1); | ||
| DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from cudaEventDisableTiming to cudaEventDefault might affect the profiling accuracy, as disabling timing could provide more precise measurements. Verify that enabling timing doesn't introduce significant overhead or inaccuracies in the measurements. If timing is not crucial, keeping cudaEventDisableTiming could be beneficial for performance.
DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
cudaEventDisableTiming));| if (vars.empty()){ | ||
| INFO(FLAGCX_INIT, "Invalid number of environment variables: 0"); | ||
| return flagcxInvalidArgument; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning flagcxInvalidArgument when vars.empty() might be too strict. Consider returning flagcxSuccess and logging a warning, as having no environment variables might be a valid use case with default configurations. This would allow the system to run with default settings without being considered an invalid argument.
if (vars.empty()){
INFO(FLAGCX_INIT, "Invalid number of environment variables: 0");
return flagcxSuccess; // Or flagcxSuccess if an empty config is acceptable
}| while (!done) { | ||
| // Construct a flagcxEnvConfig and zero-initialize | ||
| flagcxEnvConfig cfg; | ||
| memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using memset to zero-initialize cfg is good practice, but consider adding a comment explaining why this is necessary, especially if the structure contains non-POD types in the future. This can prevent unexpected behavior and make the code more maintainable.
flagcxEnvConfig cfg;
memset(&cfg, 0, sizeof(cfg)); // Zero-initialize cfg to ensure all fields are properly set| INFO(FLAGCX_TUNING, | ||
| "Candidate number: %ld.", ctx->configList.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message "Candidate number: %ld." is useful, but consider adding more details about the candidates themselves, such as the range of values being explored. This would provide more insight into the tuning process and help in debugging.
| INFO(FLAGCX_TUNING, | |
| "Candidate number: %ld.", ctx->configList.size()); | |
| INFO(FLAGCX_TUNING, | |
| "Candidate number: %ld, exploring configurations for NCCL_ALGO, NCCL_PROTO, etc.", ctx->configList.size()); |
| INFO(FLAGCX_INIT | FLAGCX_TUNING, "start to prepare communicator tag=%s(%u/%u)", tag.tag, | ||
| i, nConfigs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding FLAGCX_TUNING to the log flags provides more detailed information during the initialization phase, which is helpful for debugging and monitoring the tuning process. This ensures that tuning-related logs are included when the FLAGCX_TUNING flag is enabled.
INFO(FLAGCX_INIT, "start to prepare communicator tag=%s(%u/%u)", tag.tag,
i, nConfigs);|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the tuner. The implementation involves creating all combinations of specified environment variables. My review focuses on the new implementation in tuner_util.cc and tuner_util.h. I've found a few critical issues related to incorrect state management and logic flaws that could lead to bugs, such as loss of configurations and incorrect application behavior. I've also pointed out some areas for improvement regarding C++ best practices for header file design to improve maintainability and prevent future bugs. The changes in other files are minor and look correct.
| // If the number of variables exceeds the structure capacity, truncate | ||
| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; | ||
| } | ||
|
|
||
| // Prepare candidate value lists for each variable (at least one empty string to ensure uniform combination logic) | ||
| std::vector<std::vector<std::string>> lists; | ||
| lists.reserve(vars.size()); | ||
| for (const auto &v : vars) { | ||
| if (v.choices.empty()) { | ||
| lists.emplace_back(std::vector<std::string>{""}); | ||
| } else { | ||
| lists.emplace_back(v.choices); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block has two issues. First, it modifies the global vars vector by calling resize(), which is not thread-safe and can lead to unexpected behavior on subsequent calls. Second, it prematurely returns flagcxSuccess, preventing any candidates from being generated if the number of variables exceeds the limit. This will cause a failure later when at least one candidate is expected. The suggested change fixes both issues by processing a limited number of variables without modifying the global state and without returning early.
// If the number of variables exceeds the structure capacity, limit the number to process.
size_t nvars_to_process = vars.size();
if (nvars_to_process > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables (%zu) exceeds the maximum length (%d). Only the first %d variables will be used for candidate generation.",
vars.size(), FLAGCX_ENV_LIST_MAX_LENGTH, FLAGCX_ENV_LIST_MAX_LENGTH);
nvars_to_process = FLAGCX_ENV_LIST_MAX_LENGTH;
}
// Prepare candidate value lists for each variable (at least one empty string to ensure uniform combination logic)
std::vector<std::vector<std::string>> lists;
lists.reserve(nvars_to_process);
for (size_t i = 0; i < nvars_to_process; ++i) {
const auto& v = vars[i];
if (v.choices.empty()) {
lists.emplace_back(std::vector<std::string>{{""}});
} else {
lists.emplace_back(v.choices);
}
}| std::string tagStr = "Config " + std::to_string(numCandidate); | ||
| if (tagStr.size() < sizeof(cfg.commTag.tag)) { | ||
| safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr); | ||
| } else { | ||
| INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow"); | ||
| // Handle the error appropriately, e.g., truncate the string or skip this candidate | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic for creating configuration tags is flawed. If a generated tag string is too long, it is not set, resulting in an empty tag. Since these tags are used as keys in a map, multiple configurations with empty tags will cause collisions, leading to lost configurations and incorrect behavior. The log message about a 'potential buffer overflow' is also misleading, as safeStrCopy correctly handles truncation. The if/else block should be removed to ensure safeStrCopy is always called to truncate long tags if necessary.
std::string tagStr = "Config " + std::to_string(numCandidate);
safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);
flagcx/adaptor/tuner/tuner_util.h
Outdated
| #ifdef USE_NVIDIA_ADAPTOR | ||
|
|
||
| static EnvVar algo( | ||
| "NCCL_ALGO", | ||
| {"ring", "tree"}, | ||
| "ring" | ||
| ); | ||
|
|
||
| static EnvVar proto( | ||
| "NCCL_PROTO", | ||
| {"LL", "LL128", "Simple"}, | ||
| "Simple" | ||
| ); | ||
|
|
||
| static EnvVar thread( | ||
| "NCCL_NTHREADS", | ||
| {"128", "256"}, | ||
| "256" | ||
| ); | ||
|
|
||
| static EnvVar minChannel( | ||
| "NCCL_MIN_NCHANNELS", | ||
| {"16", "32"}, | ||
| "16" | ||
| ); | ||
|
|
||
| static EnvVar chunkSize( | ||
| "NCCL_P2P_NVL_CHUNKSIZE", | ||
| {"1024", "2048"}, | ||
| "1024" | ||
| ); | ||
| static std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize}; | ||
|
|
||
| #else | ||
| static std::vector<EnvVar> vars = {}; | ||
|
|
||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining static global variables in a header file is problematic as it creates a separate instance of these variables in each translation unit that includes the header. This can lead to subtle bugs if different parts of the code expect to share the same state but are actually working with different copies. To ensure a single instance of these variables across the program, they should be declared with extern in this header file and defined in a single .cc file (e.g., tuner_util.cc).
For example:
In tuner_util.h:
#ifdef USE_NVIDIA_ADAPTOR
extern std::vector<EnvVar> vars;
#else
extern std::vector<EnvVar> vars;
#endifIn tuner_util.cc:
#ifdef USE_NVIDIA_ADAPTOR
static EnvVar algo("NCCL_ALGO", {"ring", "tree"}, "ring");
// ... other EnvVar definitions
std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize};
#else
std::vector<EnvVar> vars = {};
#endif| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function safeStrCopy is an implementation detail of tuner_util.cc and is not used elsewhere. Declaring it static in this header file is unnecessary and pollutes the global namespace of any file that includes this header. The declaration should be removed from this header. The static definition in tuner_util.cc is sufficient.
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the tuner, replacing a hardcoded list with a dynamic generator. This is a good improvement. However, my review found a critical issue in tuner_util.cc where an undeclared variable is used and all generated candidates would receive the same non-unique tag, which would break the tuner logic. I also found a high-severity issue in tuner_util.h related to defining global variables in a header file, which can lead to subtle bugs. I've provided detailed comments and suggestions to fix these issues, along with some medium-severity findings for code cleanup.
flagcx/adaptor/tuner/tuner_util.cc
Outdated
| if (tagStr.size() < sizeof(cfg.commTag.tag)) { | ||
| safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr); | ||
| } else { | ||
| INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow"); | ||
| return flagcxInvalidArgument; | ||
| } | ||
| safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of critical issues in this block:
- The variable
tagStris used without being declared, which will cause a compilation error. - This logic would assign the same tag to every generated candidate configuration. Each candidate needs a unique
commTagto be distinguished by the tuner. Using the same tag for all of them will lead to incorrect behavior, as only the last configuration with that tag will be stored and used. - The call to
safeStrCopyon line 57 is redundant, as it's already called inside theifblock on line 52.
The numCandidate variable, which is currently incremented but unused, is perfect for generating unique tags. Here is a suggested fix that resolves all three issues by generating a unique tag for each candidate.
// Construct a unique tag for this candidate configuration.
std::string tagStr = "candidate_" + std::to_string(numCandidate);
if (tagStr.size() >= sizeof(cfg.commTag.tag)) {
INFO(FLAGCX_INIT, "Generated tag string '%s' is too long.", tagStr.c_str());
return flagcxInvalidArgument;
}
safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);
flagcx/adaptor/tuner/tuner_util.h
Outdated
| static EnvVar algo( | ||
| "NCCL_ALGO", | ||
| {"ring", "tree"}, | ||
| "ring" | ||
| ); | ||
|
|
||
| static EnvVar proto( | ||
| "NCCL_PROTO", | ||
| {"LL", "LL128", "Simple"}, | ||
| "Simple" | ||
| ); | ||
|
|
||
| static EnvVar thread( | ||
| "NCCL_NTHREADS", | ||
| {"128", "256"}, | ||
| "256" | ||
| ); | ||
|
|
||
| static EnvVar minChannel( | ||
| "NCCL_MIN_NCHANNELS", | ||
| {"16", "32"}, | ||
| "16" | ||
| ); | ||
|
|
||
| static EnvVar chunkSize( | ||
| "NCCL_P2P_NVL_CHUNKSIZE", | ||
| {"1024", "2048"}, | ||
| "1024" | ||
| ); | ||
| static std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize}; | ||
|
|
||
| #else | ||
| static std::vector<EnvVar> vars = {}; | ||
|
|
||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining static global variables in a header file is problematic. It creates a separate copy of these variables in every translation unit (.cc file) that includes this header. This leads to unnecessary memory usage and can cause subtle bugs if different parts of the code expect to share a single instance of vars.
The standard practice is to declare the variables as extern in the header file and provide a single definition in a corresponding .cc file (e.g., tuner_util.cc).
Here's the suggested change for this header file. You will also need to move the definitions of these variables from this header into tuner_util.cc and remove the static keyword from them.
#ifdef USE_NVIDIA_ADAPTOR
extern EnvVar algo;
extern EnvVar proto;
extern EnvVar thread;
extern EnvVar minChannel;
extern EnvVar chunkSize;
extern std::vector<EnvVar> vars;
#else
extern std::vector<EnvVar> vars;
#endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can declare extern std::vector vargs in tuner.h and define it in tuner_nccl.cc or other chip-related tuner_xxx.cc
|
|
||
| cfg.envCount++; | ||
| // Stop if exceeding the maximum allowed envs (should not happen since we truncated vars earlier) | ||
| if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; is unreachable. The vars vector is truncated to FLAGCX_ENV_LIST_MAX_LENGTH on line 24. The loop then iterates nvars times, where nvars <= FLAGCX_ENV_LIST_MAX_LENGTH. cfg.envCount is incremented in each iteration, so its maximum value will be nvars. Therefore, this condition will never be met. This line is dead code and can be safely removed.
| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function safeStrCopy is declared static here, but it's only implemented and used within tuner_util.cc. Declaring a file-local function in a public header is misleading and unnecessary. The declaration should be removed from this header. Its definition in tuner_util.cc is already static, which is correct for a file-local helper function.
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature for automatically generating candidate configurations for the flagcx tuner, enhancing its ability to optimize collective communication performance. The changes include modifications to the CUDA adaptor, NCCL parameter handling, tuner utilities, and core flagcx components. The most significant change is the introduction of the generateCandidate function, which replaces the hardcoded configuration list with a dynamically generated set of configurations based on environment variables. The review focuses on correctness and potential issues.
| INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow"); | ||
| return flagcxInvalidArgument; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log message "Tag string too long, potential buffer overflow" indicates a potential vulnerability. While the code returns flagcxInvalidArgument, it might be better to truncate the string to fit within the buffer to avoid unexpected behavior or security risks. Consider truncating tagStr to sizeof(cfg.commTag.tag) - 1 and adding a null terminator.
std::string tagStr = "Config " + std::to_string(numCandidate);
if (tagStr.size() < sizeof(cfg.commTag.tag)) {
safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);
} else {
INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow");
return flagcxInvalidArgument;
}| DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event), | ||
| cudaEventDisableTiming)); | ||
| cudaEventDefault)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| INFO(FLAGCX_INIT | FLAGCX_TUNING, "start to prepare communicator tag=%s(%u/%u)", tag.tag, | ||
| i, nConfigs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the tuner, which is a significant improvement over the previous hardcoded demo configurations. The implementation correctly generates a Cartesian product of environment variable choices.
My review focuses on correctness and maintainability. I've found a critical bug in the candidate generation logic that would prevent it from working when many variables are defined, and a high-severity issue in the Makefile that would lead to incorrect builds for most hardware adaptors. I've also included some suggestions to improve code quality by reducing side effects and cleaning up header files. Please see the detailed comments for suggestions.
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flagcx/adaptor/tuner/Makefile
Outdated
| else ifeq ($(USE_ILUVATAR_COREX), 1) | ||
| CXXFLAGS = -DUSE_ILUVATAR_COREX_ADAPTOR | ||
| else ifeq ($(USE_CAMBRICON), 1) | ||
| CXXFLAGS = -DUSE_CAMBRICON_ADAPTOR | ||
| else ifeq ($(USE_METAX), 1) | ||
| CXXFLAGS = -DUSE_METAX_ADAPTOR | ||
| else ifeq ($(USE_MUSA), 1) | ||
| CXXFLAGS = -DUSE_MUSA_ADAPTOR | ||
| else ifeq ($(USE_KUNLUNXIN), 1) | ||
| CXXFLAGS = -DUSE_KUNLUNXIN_ADAPTOR | ||
| else ifeq ($(USE_DU), 1) | ||
| CXXFLAGS = -DUSE_DU_ADAPTOR | ||
| else ifeq ($(USE_AMD), 1) | ||
| CXXFLAGS = -DUSE_AMD_ADAPTOR -D__HIP_PLATFORM_AMD__ | ||
| else | ||
| CXXFLAGS = -DUSE_NVIDIA_ADAPTOR | ||
| endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this if/else block, CXXFLAGS is being overwritten with = instead of being appended to with += for most of the hardware adaptor options (from line 27 onwards). This will discard the base compiler flags (e.g., -fPIC, -Wall, -std=c++11) defined on line 3, which can lead to build failures or incorrectly compiled binaries. All assignments here should use += to append flags.
else ifeq ($(USE_ILUVATAR_COREX), 1)
CXXFLAGS += -DUSE_ILUVATAR_COREX_ADAPTOR
else ifeq ($(USE_CAMBRICON), 1)
CXXFLAGS += -DUSE_CAMBRICON_ADAPTOR
else ifeq ($(USE_METAX), 1)
CXXFLAGS += -DUSE_METAX_ADAPTOR
else ifeq ($(USE_MUSA), 1)
CXXFLAGS += -DUSE_MUSA_ADAPTOR
else ifeq ($(USE_KUNLUNXIN), 1)
CXXFLAGS += -DUSE_KUNLUNXIN_ADAPTOR
else ifeq ($(USE_DU), 1)
CXXFLAGS += -DUSE_DU_ADAPTOR
else ifeq ($(USE_AMD), 1)
CXXFLAGS += -DUSE_AMD_ADAPTOR -D__HIP_PLATFORM_AMD__
else
CXXFLAGS += -DUSE_NVIDIA_ADAPTOR
endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this error.
| // If the number of variables exceeds the structure capacity, truncate | ||
| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the global vars vector with resize() here introduces a side effect that can make the program's behavior difficult to predict. A function that generates data should ideally not modify its global configuration source. Consider processing a limited number of variables without altering the vars vector itself. For example, you could introduce a local variable for the number of variables to process and use an indexed for-loop instead of a range-based for-loop.
| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function safeStrCopy is declared static here in the header file. Since it is defined with static linkage in tuner_util.cc, it is only visible within that translation unit. Its declaration does not belong in a public header file as it cannot be used by other files including this header. This can be confusing and pollutes the header's namespace. Please remove this declaration.
flagcx/adaptor/tuner/tuner_util.cc
Outdated
| return flagcxSuccess; | ||
| } | ||
|
|
||
| static EnvVar algo( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this part to nccl_param.cc and rename it to nccl_tuner.cc.
flagcx/adaptor/tuner/Makefile
Outdated
| endif | ||
|
|
||
| # Target shared library name | ||
| TARGET = libnccl_param.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libnccl_param.so -> libflagcx_tuner.so
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the tuner by defining a search space of environment variables and creating all possible combinations. The implementation is mostly solid, but I've identified a few issues. My review includes a fix for a critical logic bug in the candidate generation, a correction to the build configuration to prevent potential symbol conflicts, and several smaller improvements for code clarity and maintainability following C++ best practices. Addressing these points will make the new feature more robust and the codebase cleaner.
| $(wildcard flagcx/adaptor/ccl/*.cc) \ | ||
| $(wildcard flagcx/adaptor/net/*.cc) \ | ||
| $(wildcard flagcx/adaptor/tuner/tuner_util.cc) \ | ||
| $(wildcard flagcx/adaptor/tuner/*.cc) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a wildcard *.cc here includes flagcx/adaptor/tuner/nccl_param.cc in the libflagcx.so build. This file appears to be designed for LD_PRELOAD to intercept NCCL's internal functions (e.g., ncclParamNthreads), and has its own Makefile in its directory. Including it directly into libflagcx.so can lead to unintended symbol conflicts if an application links against both libflagcx.so and libnccl.so. It would be safer to either explicitly list the source files from the tuner directory that should be part of libflagcx.so, or filter out nccl_param.cc from the wildcard.
$(filter-out %/nccl_param.cc, $(wildcard flagcx/adaptor/tuner/*.cc)) \
| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a logical error here. When the number of variables exceeds FLAGCX_ENV_LIST_MAX_LENGTH, the code truncates the vars vector but then immediately returns flagcxSuccess on line 25. This prevents any candidates from being generated, leaving cfgList empty, which will likely cause failures later on. The intention seems to be to proceed with generating candidates from the truncated list. The return flagcxSuccess; statement on line 25 should be removed to fix this.
// If the number of variables exceeds the structure capacity, truncate
if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH);
}| flagcxEnvConfig cfg; | ||
| memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines for declaring and then zero-initializing cfg can be combined into a single, more idiomatic C++ statement using value initialization ({}). This is safer, clearer, and avoids using C-style memset.
flagcxEnvConfig cfg{}; // this zeroes commTag and all fields; adjust if you want non-zero defaults| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function safeStrCopy is defined as a static function within tuner_util.cc, which gives it internal linkage, meaning it's only visible within that translation unit. Declaring it in this header file is therefore incorrect and misleading, as it cannot be called from other files that include this header. Since it's a private helper function for tuner_util.cc, its declaration should be removed from this public header.
| #include <sstream> | ||
| #include <string> | ||
| #include <vector> | ||
| #include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -3,8 +3,9 @@ CXX = g++ | |||
| CXXFLAGS = -fPIC -Wall -Wextra -O2 -std=c++11 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this file and nccl_param.cc into a subfolder tuner/param
flagcx/adaptor/tuner/param/Makefile
Outdated
| # Target shared library name | ||
| TARGET = libnccl_param.so | ||
| TARGET = libflagcx_tuner.so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libflagcx_param.so
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the performance tuner. It replaces a hardcoded list of configurations with a mechanism that generates a Cartesian product of possible environment variable values. Key changes include adding a generateCandidate function, defining NCCL-specific tuning variables, and integrating this new mechanism into the tuner initialization. The changes are well-structured. My review includes suggestions to remove leftover debugging code, improve error handling for configuration limits, and clean up header file declarations and unnecessary includes.
| int64_t ncclParam##name() { \ | ||
| int64_t value = INT64_MIN; \ | ||
| ncclLoadParam("NCCL_" env, deftVal, &value); \ | ||
| std::cout<<"========="<<env<<" is set to "<<value<<std::endl; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the number of environment variables exceeds FLAGCX_ENV_LIST_MAX_LENGTH, the vars vector is truncated and the function returns flagcxSuccess. This behavior can silently hide a configuration issue where not all intended variables are being used for candidate generation. It would be safer to return an error, like flagcxInvalidArgument, to make the caller aware of the problem. Modifying the global vars vector can also have unintended side effects.
return flagcxInvalidArgument;| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declaring a static function in a header file is generally considered bad practice. It creates a separate copy of the function in every translation unit that includes the header, which can lead to code bloat. Since safeStrCopy is defined as static in tuner_util.cc and is only used within that file, its declaration should be removed from this header file to enforce internal linkage and improve code clarity.
| #include <sstream> | ||
| #include <string> | ||
| #include <vector> | ||
| #include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@gemini-code-assist please review this pull request in details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a feature to automatically generate candidate configurations for the tuner. The implementation looks good overall, but I've found a few issues that could affect correctness and maintainability. My main concerns are a potential linker error when building for non-NVIDIA platforms, a logic bug in handling an excessive number of environment variables, and some minor code hygiene issues like unnecessary header includes and dead code. The suggested changes aim to make the implementation more robust and easier to maintain.
| #ifdef USE_NVIDIA_ADAPTOR | ||
|
|
||
| static EnvVar algo( | ||
| "NCCL_ALGO", | ||
| {"ring", "tree"}, | ||
| "ring" | ||
| ); | ||
|
|
||
| static EnvVar proto( | ||
| "NCCL_PROTO", | ||
| {"LL", "LL128", "Simple"}, | ||
| "Simple" | ||
| ); | ||
|
|
||
| static EnvVar thread( | ||
| "NCCL_NTHREADS", | ||
| {"128", "256"}, | ||
| "256" | ||
| ); | ||
|
|
||
| static EnvVar minChannel( | ||
| "NCCL_MIN_NCHANNELS", | ||
| {"16", "32"}, | ||
| "16" | ||
| ); | ||
|
|
||
| static EnvVar chunkSize( | ||
| "NCCL_P2P_NVL_CHUNKSIZE", | ||
| {"1024", "2048"}, | ||
| "1024" | ||
| ); | ||
|
|
||
| std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize}; | ||
|
|
||
| #endif No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional definition of the global vars vector within #ifdef USE_NVIDIA_ADAPTOR can lead to linker errors. The Makefile is configured to compile all .cc files in this directory, including this one, regardless of the selected vendor. If USE_NVIDIA_ADAPTOR is not defined, vars will be declared extern in tuner_util.h but not defined anywhere, causing a linker failure.
To fix this and make the design more robust for other potential vendor tuners, you can provide a weak default definition in an #else block. This ensures that there's always a definition for vars, which will be an empty vector if no specific tuner is active. If another tuner file (e.g., for AMD) provides a strong definition, the linker will use that one, and if multiple files provide weak definitions, the linker will pick one without error.
#ifdef USE_NVIDIA_ADAPTOR
static EnvVar algo(
"NCCL_ALGO",
{"ring", "tree"},
"ring"
);
static EnvVar proto(
"NCCL_PROTO",
{"LL", "LL128", "Simple"},
"Simple"
);
static EnvVar thread(
"NCCL_NTHREADS",
{"128", "256"},
"256"
);
static EnvVar minChannel(
"NCCL_MIN_NCHANNELS",
{"16", "32"},
"16"
);
static EnvVar chunkSize(
"NCCL_P2P_NVL_CHUNKSIZE",
{"1024", "2048"},
"1024"
);
std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize};
#else
// Provide a weak definition for `vars` to avoid linker errors when building for other vendors.
// Another vendor-specific tuner file can provide a strong definition.
// If no tuner is active, this will default to an empty vector, which is handled correctly.
__attribute__((weak)) std::vector<EnvVar> vars;
#endif| if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) { | ||
| INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH"); | ||
| vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector | ||
| INFO(FLAGCX_INIT, "The number of environment variables has been truncated to FLAGCX_ENV_LIST_MAX_LENGTH (%d)",FLAGCX_ENV_LIST_MAX_LENGTH); | ||
| return flagcxSuccess; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a logic bug when the number of environment variables exceeds FLAGCX_ENV_LIST_MAX_LENGTH. The code truncates the vars vector but then immediately returns flagcxSuccess, causing generateCandidate to exit without generating any configurations. The intended behavior should be to proceed with generating candidates using the truncated list.
Additionally, truncating the user's configuration is a significant event that should be logged as a WARN instead of INFO to ensure it's noticed.
if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
WARN(FLAGCX_INIT, "The number of environment variables (%zu) exceeds the maximum length (%d), truncating.", vars.size(), FLAGCX_ENV_LIST_MAX_LENGTH);
vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
}| }; | ||
|
|
||
| flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList); | ||
| static void safeStrCopy(char *dst, size_t dstSize, const std::string &src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function safeStrCopy is defined as static in tuner_util.cc, giving it internal linkage. This means it is only visible within that file. Declaring it in this header file is incorrect because it cannot be accessed from other translation units and can be misleading. Since it's a local helper function for tuner_util.cc, you should remove its declaration from this header to improve encapsulation and code clarity.
| // Stop if exceeding the maximum allowed envs (should not happen since we truncated vars earlier) | ||
| if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; is unreachable and constitutes dead code. The vars vector is truncated to FLAGCX_ENV_LIST_MAX_LENGTH at the beginning of the function, and the loop that populates cfg.envs iterates vars.size() times. Therefore, cfg.envCount cannot exceed the limit within this loop. Removing this redundant check will improve code clarity.
| #include <sstream> | ||
| #include <string> | ||
| #include <vector> | ||
| #include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.