-
Couldn't load subscription status.
- Fork 113
[CICD] Refactoring the CI trigger mechanism for Metax #863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @zihugithub, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Continuous Integration (CI) framework for Metax by refactoring its trigger mechanism. The changes introduce a more structured and reliable approach to functional testing, expanding coverage for various large language models and ensuring stable test execution through pre-checks for GPU resource availability. The new modular configuration and robust result validation utilities aim to improve the overall efficiency and reliability of the CI pipeline. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a comprehensive set of functional tests and a CI trigger mechanism for Metax devices. The changes are extensive, adding numerous configuration files, test scripts, and utility code. While the overall structure is a good starting point, I've identified several areas for improvement regarding robustness, maintainability, and correctness in the test scripts and configurations. My review includes suggestions to address brittle path constructions, anti-patterns like using sleep for synchronization, inconsistencies in configuration files, and opportunities for refactoring to reduce code duplication. Addressing these points will significantly improve the quality and reliability of the new testing framework.
|
|
||
|
|
||
| @pytest.mark.usefixtures("test_path", "test_type", "test_task", "test_case") | ||
| def test_equal(test_path, test_type, test_task, test_case, monkeypatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test function test_equal is decorated with pytest.mark.usefixtures and accepts several fixtures (test_path, test_type, test_task, test_case), but these are not used within the function body. The test makes a request with a hardcoded URL and data, which makes it inflexible and its purpose unclear. The test should be refactored to utilize the provided fixtures to run a meaningful, dynamic test case.
|
|
||
| try: | ||
| result = parse_config(args.config, args.type, args.task) | ||
| print(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current method for parsing test cases from config.yml is fragile. It relies on the shell script to parse the string representation of a Python list. A more robust approach would be to:
- Use a standard list format in
config.yml(e.g.,- 7b-tp2). - Modify this Python script to print the list elements joined by a space.
This will make the configuration clearer and the parsing logic in the shell script more reliable.
| print(result) | |
| result = parse_config(args.config, args.type, args.task) | |
| if isinstance(result, list): | |
| print(" ".join(result)) | |
| else: | |
| print(result) |
| fi | ||
|
|
||
| if [ "${_type}" = "inference" ]; then | ||
| run_command "python $FLAG_DIR/run.py --config-path tests/${PWD##*/}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path construction tests/${PWD##*/}/... is brittle as it depends on the name of the current working directory. This can cause tests to fail when run from different locations. You should use the ${device} variable, which is reliably determined by the get_device_type function, to ensure the path is always correct.
| run_command "python $FLAG_DIR/run.py --config-path tests/${PWD##*/}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case | |
| run_command "python $FLAG_DIR/run.py --config-path tests/${device}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case |
| fi | ||
|
|
||
| # Ensure that pytest check is completed before deleting the folder | ||
| sleep 10s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using sleep to wait for an operation to complete is an anti-pattern that can lead to either flaky tests (if the sleep is too short) or inefficient execution (if it's too long). The script should implement a reliable synchronization mechanism to ensure that the test results are fully written before pytest is executed. The run_command function appears synchronous, so if run.py spawns a background process, the script must explicitly wait for it.
| CUDNN_BENCHMARK: "false" | ||
| CUDNN_DETERMINISTIC: "true" | ||
| USE_FLAGGEMS: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an inconsistent use of boolean values for environment variables in this and other YAML configuration files. Some are defined as strings (e.g., "true"), while others use YAML's native boolean type (e.g., true on line 42). For consistency and to prevent potential parsing issues, it's best to use native YAML booleans throughout.
CUDNN_BENCHMARK: false
CUDNN_DETERMINISTIC: true
USE_FLAGGEMS: true|
|
||
| assert os.path.exists(result_path), f"Failed to find 'stdout.log' at {result_path}" | ||
|
|
||
| with open(result_path, "r") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opening files without specifying an encoding can lead to inconsistent behavior across different operating systems or locales, and may cause a UnicodeDecodeError if the file contains non-ASCII characters. It's a best practice to always specify the encoding. This applies to all open() calls throughout this file.
| with open(result_path, "r") as file: | |
| with open(result_path, "r", encoding="utf-8") as file: |
| def test_inference_equal(test_path, test_type, test_task, test_case): | ||
| # Construct the test_result_path using the provided fixtures | ||
| test_result_path = os.path.join(test_path, test_type, test_task, "results_test", test_case) | ||
| result_path = os.path.join(test_result_path, "inference_logs/host_0_localhost.output") | ||
|
|
||
| print("result_path:", result_path) | ||
|
|
||
| assert os.path.exists(result_path), f"Failed to find 'host_0_localhost.output' at {result_path}" | ||
|
|
||
| with open(result_path, "r") as file: | ||
| lines = file.readlines() | ||
|
|
||
| result_lines = [] | ||
| output = False | ||
| for line in lines: | ||
| assert "Failed to import 'flag_gems'" not in line, "Failed to import 'flag_gems''" | ||
| if line == "**************************************************\n": | ||
| output = True | ||
| if line == "##################################################\n": | ||
| output = False | ||
| if output == True: | ||
| result_lines.append(line) | ||
|
|
||
| gold_value_path = os.path.join(test_path, test_type, test_task, "results_gold", test_case) | ||
| assert os.path.exists(gold_value_path), f"Failed to find gold result at {gold_value_path}" | ||
|
|
||
| with open(gold_value_path, "r") as file: | ||
| gold_value_lines = file.readlines() | ||
|
|
||
| # Remove the blank line at the end. | ||
| if gold_value_lines: | ||
| last_non_empty = len(gold_value_lines) - 1 | ||
| while last_non_empty >= 0 and not gold_value_lines[last_non_empty].strip(): | ||
| last_non_empty -= 1 | ||
| if last_non_empty >= 0: | ||
| gold_value_lines = gold_value_lines[: last_non_empty + 1] | ||
| else: | ||
| gold_value_lines = [] | ||
|
|
||
| print("\nResult checking") | ||
| print("Result: ", result_lines) | ||
| print("Gold Result: ", gold_value_lines) | ||
|
|
||
| print("len(result_lines), (gold_value_lines): ", len(result_lines), len(gold_value_lines)) | ||
| assert len(result_lines) == len(gold_value_lines) | ||
|
|
||
| for result_line, gold_value_line in zip(result_lines, gold_value_lines): | ||
| print(result_line, gold_value_line) | ||
| assert result_line.rstrip('\n') == gold_value_line.rstrip('\n') | ||
|
|
||
|
|
||
| @pytest.mark.usefixtures("test_path", "test_type", "test_task", "test_case") | ||
| def test_inference_pipeline(test_path, test_type, test_task, test_case): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions test_inference_equal and test_inference_pipeline contain a significant amount of duplicated code for locating and reading the inference output file. This redundancy makes the code harder to maintain. This common logic should be extracted into a shared helper function to improve modularity and reduce code duplication.
|
|
||
| # Check the next three lines for equality before the '=' character | ||
| for j in range(1, 4): | ||
| result_parts = result_group[j].split('=') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using split('=') to parse key-value pairs from the log file is not robust, as it will fail if the value itself contains an = character. To handle such cases correctly, you should split only on the first occurrence of the delimiter.
| result_parts = result_group[j].split('=') | |
| result_parts = result_group[j].split('=', 1) |
|
|
||
| # Wait and show current status | ||
| echo "Waiting for Metax GPU memory usage to drop below 50% (current max usage: ${max_usage_percent}%)" | ||
| sleep 1m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| for cmd in "${commands[@]}"; do | ||
| # Execute the command | ||
| $cmd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executing commands directly from a variable ($cmd) can be unsafe and lead to issues with word splitting if arguments contain spaces or special characters. While the current commands are simple, using eval provides a safer and more robust way to execute commands stored in strings.
| $cmd | |
| eval "$cmd" |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
…lagScale into refactor_cicd251013
Refactoring the CI trigger mechanism for Metax