-
Notifications
You must be signed in to change notification settings - Fork 40
Fix the recv_buffer issue encountered during LLaMA3 training #273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @whollo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses an issue with an environment variable check by accepting "True" in addition to "TRUE". While this improves usability, the implementation introduces duplicated logic across four files. A more robust and maintainable solution would be to use a case-insensitive string comparison function like strcasecmp. This would handle all variations of 'true' (e.g., 'TRUE', 'True', 'true') and simplify the code. I've provided specific suggestions for each file to use strcasecmp. For long-term maintainability, I recommend abstracting this logic into a shared utility function.
| (strcmp(enableTopoDetect, "TRUE") == 0 || | ||
| strcmp(enableTopoDetect, "True") == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of checking for "TRUE" and "True" separately, using a case-insensitive comparison would be more robust and cleaner. The strcasecmp function (from <strings.h>) can handle all cases like "TRUE", "True", and "true" with a single call, simplifying the code. This same pattern is repeated in other files in this pull request.
strcasecmp(enableTopoDetect, "TRUE") == 0) {| FLAGCXCHECK(flagcxNetInit(comm)); | ||
| INFO(FLAGCX_INIT, "Using network %s", comm->netAdaptor->name); | ||
| if (env && strcmp(env, "TRUE") == 0) { | ||
| if (env && (strcmp(env, "TRUE") == 0 || strcmp(env, "True") == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the environment variable check more robust, consider using a case-insensitive string comparison. The strcasecmp function from <strings.h> would handle all variations like "TRUE", "True", and "true", which simplifies the condition and improves maintainability.
| if (env && (strcmp(env, "TRUE") == 0 || strcmp(env, "True") == 0)) { | |
| if (env && strcasecmp(env, "TRUE") == 0) { |
| (strcmp(enable_topo_detect, "TRUE") == 0 || | ||
| strcmp(enable_topo_detect, "True") == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A case-insensitive comparison using strcasecmp (from <strings.h>) would be a more robust and cleaner way to check this environment variable. It would handle various capitalizations like "TRUE", "True", and "true" in a single function call.
(strcmp(enable_topo_detect, "TRUE") == 0 || strcasecmp(enable_topo_detect, "True") == 0)) {| if (enableTopoDetect && (strcmp(enableTopoDetect, "TRUE") == 0 || | ||
| strcmp(enableTopoDetect, "True") == | ||
| 0)) { // safety check nic distance is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic for checking the environment variable is duplicated in several files. Using a case-insensitive comparison like strcasecmp would simplify this condition and make it more robust. To improve maintainability and avoid code duplication, consider creating a shared helper function for this check.
if (enableTopoDetect && strcasecmp(enableTopoDetect, "TRUE") == 0) { // safety check nic distance is only| FLAGCXCHECK(flagcxNetInit(comm)); | ||
| INFO(FLAGCX_INIT, "Using network %s", comm->netAdaptor->name); | ||
| if (env && strcmp(env, "TRUE") == 0) { | ||
| if (env && (strcmp(env, "TRUE") == 0 || strcmp(env, "True") == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This env is useless and we can remove it.
| flagcxGetEnv("FLAGCX_INTERSERVER_ROUTE_FILE"); | ||
| if (enableTopoDetect && interServerTopoFile && | ||
| strcmp(enableTopoDetect, "TRUE") == 0) { | ||
| (strcmp(enableTopoDetect, "TRUE") == 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try strcasecmp instead
Fix the problem with recv_buffer encountered in the process of LLaMA3 training