Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion flagcx/adaptor/device/cuda_adaptor.cc
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ flagcxResult_t cudaAdaptorEventCreate(flagcxEvent_t *event) {
(*event) = NULL;
flagcxCalloc(event, 1);
DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change from cudaEventDisableTiming to cudaEventDefault might affect the profiling accuracy, as disabling timing could provide more precise measurements. Verify that enabling timing doesn't introduce significant overhead or inaccuracies in the measurements. If timing is not crucial, keeping cudaEventDisableTiming could be beneficial for performance.

DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
                                    cudaEventDisableTiming));

cudaEventDisableTiming));
cudaEventDefault));
Comment on lines 240 to +241

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using cudaEventDefault might not be the most appropriate flag, as it allows timing. If timing is not required, it's better to use cudaEventDisableTiming for potentially better performance. If timing is required, then this is fine.

  DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
                                    cudaEventDisableTiming));
  DEVCHECK(cudaEventCreateWithFlags((cudaEvent_t *)(*event),
                                    cudaEventDisableTiming));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change from cudaEventDisableTiming to cudaEventDefault might affect the accuracy of timing measurements. Verify that this change doesn't negatively impact the tuner's ability to select optimal configurations. If timing is still needed, consider using cudaEventBlockingSync or cudaEventDefault with appropriate flags.

If timing is not needed, then this change is fine.

Comment on lines 240 to +241

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The change from cudaEventDisableTiming to cudaEventDefault might affect the accuracy of timing measurements. It's crucial to verify that this change doesn't negatively impact the tuner's ability to select optimal configurations. If timing is disabled, the tuner will not be able to profile.

return flagcxSuccess;
}

Expand Down
96 changes: 82 additions & 14 deletions flagcx/adaptor/tuner/tuner_util.cc
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,19 +1,87 @@
#include "tuner/tuner_util.h"

#ifdef USE_NVIDIA_ADAPTOR
static struct flagcxEnvConfig config1 = {
"defaultConfig1",
1,
{FLAGCX_ENV_TYPE_CREATION, "NCCL_P2P_NVL_CHUNKSIZE", "1024", "524288"}};
static struct flagcxEnvConfig config2 = {
"defaultConfig2",
1,
{FLAGCX_ENV_TYPE_CREATION, "NCCL_P2P_NVL_CHUNKSIZE", "524288", "524288"}};

// demo
flagcxResult_t loadConfigList(std::vector<struct flagcxEnvConfig> &cfgList) {
cfgList.push_back(config1);
cfgList.push_back(config2);
return flagcxSuccess;

// Safely copy std::string to char buffer, ensuring NUL termination and truncation
static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) {
if (dstSize == 0) return;
size_t copyLen = std::min(dstSize - 1, src.size());
if (copyLen > 0) memcpy(dst, src.data(), copyLen);
dst[copyLen] = '\0';
Comment on lines +4 to +8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The safeStrCopy function truncates the source string if it's larger than the destination buffer. However, it doesn't provide any feedback to the user about this truncation. Consider adding a log message to inform the user when truncation occurs, as this could lead to unexpected behavior if the truncated part is significant.

static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) {
    if (dstSize == 0) return;
    size_t copyLen = std::min(dstSize - 1, src.size());
    if (copyLen > 0) memcpy(dst, src.data(), copyLen);
    dst[copyLen] = '\0';
    if (copyLen < src.size()) {
        INFO(FLAGCX_INIT, "String truncated during safeStrCopy");
    }
}
static void safeStrCopy(char *dst, size_t dstSize, const std::string &src) {
    if (dstSize == 0) return;
    size_t copyLen = std::min(dstSize - 1, src.size());
    if (copyLen > 0) memcpy(dst, src.data(), copyLen);
    dst[copyLen] = '\0';
    if (copyLen < src.size()) {
        INFO(FLAGCX_INIT, "String truncated during safeStrCopy");
    }
}

}

// Generate all combinations and return a vector of flagcxEnvConfig
flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList) {

// Return empty if there are no environment variables
if (vars.empty()){
INFO(FLAGCX_INIT, "Invalid number of environment variables: 0");
return flagcxInvalidArgument;
}
Comment on lines +15 to +18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The log message indicates an invalid number of environment variables, but the function returns flagcxInvalidArgument. It would be more appropriate to return flagcxSuccess since the function is explicitly handling the case where no environment variables are present, and the framework should continue without erroring out. Returning flagcxInvalidArgument might halt the execution of the framework, which is not the desired behavior.

      INFO(FLAGCX_INIT, "Invalid number of environment variables: 0");
      return flagcxSuccess;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to return flagcxSuccess and do nothing.

Comment on lines +15 to +18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning flagcxInvalidArgument when vars.empty() might be too strict. Consider returning flagcxSuccess and logging a warning, as having no environment variables might be a valid use case with default configurations. This would allow the system to run with default settings without being considered an invalid argument.

    if (vars.empty()){
      INFO(FLAGCX_INIT, "Invalid number of environment variables: 0");
      return flagcxSuccess; // Or flagcxSuccess if an empty config is acceptable
    }


// If the number of variables exceeds the structure capacity, truncate
if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
return flagcxInvalidArgument;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code checks if the number of variables exceeds FLAGCX_ENV_LIST_MAX_LENGTH and returns flagcxInvalidArgument. However, the variables are not actually truncated. It would be more appropriate to truncate the vars vector to FLAGCX_ENV_LIST_MAX_LENGTH to prevent exceeding the bounds of the envs array in the flagcxEnvConfig struct. This would also align the behavior with the comment's intention.

    if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
      INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
      vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
      return flagcxInvalidArgument; // Or continue after truncation, depending on desired behavior
    }
    if (vars.size() > (size_t)FLAGCX_ENV_LIST_MAX_LENGTH) {
      INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
      vars.resize(FLAGCX_ENV_LIST_MAX_LENGTH); // Truncate the vars vector
      return flagcxInvalidArgument; // Or continue after truncation, depending on desired behavior
    }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just truncate vars to FLAGCX_ENV_LIST_MAX_LENGTH and return flagcxSuccess?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous comment, returning flagcxInvalidArgument when the number of variables exceeds the maximum length might be too strict. Consider truncating the variable list and continuing with a warning, or returning flagcxSuccess to allow the framework to proceed with the truncated list. Halting execution due to exceeding the maximum length might not be the desired behavior.

      INFO(FLAGCX_INIT, "The number of environment variables exceeds the maximum length defined by FLAGCX_ENV_LIST_MAX_LENGTH");
      return flagcxSuccess;

}

// Prepare candidate value lists for each variable (at least one empty string to ensure uniform combination logic)
std::vector<std::vector<std::string>> lists;
lists.reserve(vars.size());
for (const auto &v : vars) {
if (v.choices.empty()) {
lists.emplace_back(std::vector<std::string>{""});
} else {
lists.emplace_back(v.choices);
}
}

// Use an index vector to iterate through the Cartesian product (multi-dimensional counter)
size_t nvars = lists.size();
std::vector<size_t> idx(nvars, 0);
bool done = (nvars == 0);
unsigned long numCandidate = 0;

while (!done) {
// Construct a flagcxEnvConfig and zero-initialize
flagcxEnvConfig cfg;
memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using memset to zero-initialize the flagcxEnvConfig structure is good practice. However, it's crucial to ensure that the commTag field within flagcxEnvConfig is also properly initialized, especially the tag array. If the tag array is not null-terminated, it could lead to issues when the tag is used later on. Consider explicitly null-terminating the tag array after the memset operation.

        flagcxEnvConfig cfg;
        memset(&cfg, 0, sizeof(cfg)); // this zeroes commTag and all fields; adjust if you want non-zero defaults
        cfg.commTag.tag[0] = '\0'; // Ensure null termination

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using memset to zero-initialize cfg is good practice, but consider adding a comment explaining why this is necessary, especially if the structure contains non-POD types in the future. This can prevent unexpected behavior and make the code more maintainable.

        flagcxEnvConfig cfg;
        memset(&cfg, 0, sizeof(cfg)); // Zero-initialize cfg to ensure all fields are properly set

Comment on lines +47 to +48

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These two lines for declaring and then zero-initializing cfg can be combined into a single, more idiomatic C++ statement using value initialization ({}). This is safer, clearer, and avoids using C-style memset.

        flagcxEnvConfig cfg{}; // this zeroes commTag and all fields; adjust if you want non-zero defaults


std::string tagStr = "Config " + std::to_string(numCandidate);
safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The tag string "Config " + std::to_string(numCandidate) might become too long and cause a buffer overflow when copied to cfg.commTag.tag. Ensure that tagStr.size() is always less than sizeof(cfg.commTag.tag) before calling safeStrCopy. Consider adding a check to prevent potential buffer overflows.

        std::string tagStr = "Config " + std::to_string(numCandidate);
        if (tagStr.size() < sizeof(cfg.commTag.tag)) {
          safeStrCopy(cfg.commTag.tag, sizeof(cfg.commTag.tag), tagStr);
        } else {
          INFO(FLAGCX_INIT, "Tag string too long, potential buffer overflow");
          // Handle the error appropriately, e.g., truncate the string or skip this candidate
        }

cfg.envCount = 0;

// Fill envs
for (size_t i = 0; i < nvars; ++i) {
flagcxEnvEntity &ent = cfg.envs[i];
// type
ent.type = FLAGCX_ENV_TYPE_CREATION;
// name
safeStrCopy(ent.name, sizeof(ent.name), vars[i].name);
// value
const std::string &val = lists[i][idx[i]];
safeStrCopy(ent.value, sizeof(ent.value), val);
// defaultValue
safeStrCopy(ent.defaultValue, sizeof(ent.defaultValue), vars[i].defaultValue);

cfg.envCount++;
// Stop if exceeding the maximum allowed envs (should not happen since we truncated vars earlier)
if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; is unreachable. The vars vector is truncated to FLAGCX_ENV_LIST_MAX_LENGTH on line 24. The loop then iterates nvars times, where nvars <= FLAGCX_ENV_LIST_MAX_LENGTH. cfg.envCount is incremented in each iteration, so its maximum value will be nvars. Therefore, this condition will never be met. This line is dead code and can be safely removed.

Comment on lines +73 to +74

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check if (cfg.envCount >= FLAGCX_ENV_LIST_MAX_LENGTH) break; is unreachable and constitutes dead code. The vars vector is truncated to FLAGCX_ENV_LIST_MAX_LENGTH at the beginning of the function, and the loop that populates cfg.envs iterates vars.size() times. Therefore, cfg.envCount cannot exceed the limit within this loop. Removing this redundant check will improve code clarity.

}

cfgList.push_back(cfg);

// Increment counter (from least significant to most significant)
for (int i = (int)nvars - 1; i >= 0; --i) {
idx[i]++;
if (idx[i] < lists[i].size()) break;
idx[i] = 0;
if (i == 0) done = true;
}
numCandidate += 1;
}

return flagcxSuccess;
}

#endif
50 changes: 49 additions & 1 deletion flagcx/adaptor/tuner/tuner_util.h
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,56 @@

#include "tuner.h" // struct flagcxEnvConfig
#include <vector>
#include <string>

// This is a demonstration function that provide a way to load all config list for a specific GPU.
flagcxResult_t loadConfigList(std::vector<struct flagcxEnvConfig> &cfgList);

struct EnvVar {
std::string name;
std::vector<std::string> choices;
std::string defaultValue;
EnvVar(std::string n="") : name(std::move(n)) {}
EnvVar(std::string n, std::vector<std::string> c, std::string d = "")
: name(std::move(n)), choices(std::move(c)), defaultValue(std::move(d)) {}
};

flagcxResult_t generateCandidate(std::vector<struct flagcxEnvConfig> &cfgList);
std::vector<struct flagcxEnvConfig> generateCandidate(std::vector<EnvVar> vars);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function generateCandidate(std::vector<EnvVar> vars) is declared but not defined. Either provide a definition for this function or remove the declaration if it's not intended to be used. Leaving it declared but undefined can lead to linker errors.

static void safeStrCopy(char *dst, size_t dstSize, const std::string &src);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function safeStrCopy is an implementation detail of tuner_util.cc and is not used elsewhere. Declaring it static in this header file is unnecessary and pollutes the global namespace of any file that includes this header. The declaration should be removed from this header. The static definition in tuner_util.cc is sufficient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function safeStrCopy is declared static here, but it's only implemented and used within tuner_util.cc. Declaring a file-local function in a public header is misleading and unnecessary. The declaration should be removed from this header. Its definition in tuner_util.cc is already static, which is correct for a file-local helper function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function safeStrCopy is declared static here in the header file. Since it is defined with static linkage in tuner_util.cc, it is only visible within that translation unit. Its declaration does not belong in a public header file as it cannot be used by other files including this header. This can be confusing and pollutes the header's namespace. Please remove this declaration.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function safeStrCopy is defined as a static function within tuner_util.cc, which gives it internal linkage, meaning it's only visible within that translation unit. Declaring it in this header file is therefore incorrect and misleading, as it cannot be called from other files that include this header. Since it's a private helper function for tuner_util.cc, its declaration should be removed from this public header.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Declaring a static function in a header file is generally considered bad practice. It creates a separate copy of the function in every translation unit that includes the header, which can lead to code bloat. Since safeStrCopy is defined as static in tuner_util.cc and is only used within that file, its declaration should be removed from this header file to enforce internal linkage and improve code clarity.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function safeStrCopy is defined as static in tuner_util.cc, giving it internal linkage. This means it is only visible within that file. Declaring it in this header file is incorrect because it cannot be accessed from other translation units and can be misleading. Since it's a local helper function for tuner_util.cc, you should remove its declaration from this header to improve encapsulation and code clarity.


#ifdef USE_NVIDIA_ADAPTOR

static EnvVar algo(
"NCCL_ALGO",
{"ring", "tree"},
"ring"
);

static EnvVar proto(
"NCCL_PROTO",
{"LL", "LL128", "Simple"},
"Simple"
);

static EnvVar thread(
"NCCL_NTHREADS",
{"128", "256"},
"256"
);

static EnvVar minChannel(
"NCCL_MIN_NCHANNELS",
{"16", "32"},
"16"
);

static EnvVar chunkSize(
"NCCL_P2P_NET_CHUNKSIZE",
{"1024", "2048"},
"1024"
);
static std::vector<EnvVar> vars = {algo, proto, thread, minChannel, chunkSize};

#endif

#endif // end include guard
5 changes: 4 additions & 1 deletion flagcx/core/flagcx_tuner.cc
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <sstream>
#include <string>
#include <vector>
#include <iostream>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The <iostream> header is included but appears to be unused in this file. It's good practice to remove unused includes to improve compilation times and reduce code clutter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The <iostream> header seems to be unused in this file. It's good practice to remove unnecessary includes to improve compilation times and reduce dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The <iostream> header is included but not used in this file. It's best practice to remove unused includes to reduce compilation time and unnecessary dependencies.


// A category of collective operation. the minimal unit for tuning.
struct TunerCollCategory {
Expand Down Expand Up @@ -142,7 +143,9 @@ flagcxResult_t flagcxTunerInit(size_t nRanks, size_t nNodes,
flagcxDebugLogger_t logFunction,
void **context) {
struct flagcxTunerContext *ctx = new struct flagcxTunerContext;
FLAGCXCHECK(loadConfigList(ctx->configList));
FLAGCXCHECK(generateCandidate(ctx->configList));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Consider adding error handling for the generateCandidate function call. If generateCandidate fails, the tuner initialization should also fail gracefully, preventing potential issues later on. Add a check to ensure that generateCandidate returns flagcxSuccess before proceeding.

Suggested change
FLAGCXCHECK(generateCandidate(ctx->configList));
flagcxResult_t result = generateCandidate(ctx->configList);
if (result != flagcxSuccess) {
INFO(FLAGCX_TUNING, "Candidate generation failed.");
delete ctx;
return result;
}

INFO(FLAGCX_TUNING,
"Candidate number: %ld.", ctx->configList.size());
Comment on lines +147 to +148

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message "Candidate number: %ld." is useful, but consider adding more details about the candidates themselves, such as the range of values being explored. This would provide more insight into the tuning process and help in debugging.

Suggested change
INFO(FLAGCX_TUNING,
"Candidate number: %ld.", ctx->configList.size());
INFO(FLAGCX_TUNING,
"Candidate number: %ld, exploring configurations for NCCL_ALGO, NCCL_PROTO, etc.", ctx->configList.size());

ctx->logger = logFunction;
*context = ctx;

Expand Down