[CICD] Refactoring the CI trigger mechanism for Metax #863

zihugithub · 2025-10-14T09:28:38Z

Refactoring the CI trigger mechanism for Metax

gemini-code-assist · 2025-10-14T09:29:13Z

Summary of Changes

Hello @zihugithub, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Continuous Integration (CI) framework for Metax by refactoring its trigger mechanism. The changes introduce a more structured and reliable approach to functional testing, expanding coverage for various large language models and ensuring stable test execution through pre-checks for GPU resource availability. The new modular configuration and robust result validation utilities aim to improve the overall efficiency and reliability of the CI pipeline.

Highlights

Expanded Inference Test Coverage: Introduced new functional inference test cases for various large language models, including DeepSeek R1 Distill Qwen, OPI Llama 3.1 Instruct, Qwen3, and RoboBrain2, with configurations for both standard and "flaggems" versions.
Robust CI Trigger Mechanism: Refactored the CI trigger logic for Metax, incorporating new shell scripts (_gpu_check.sh, test_all.sh, test_task.sh) to manage test execution, including a pre-check for GPU resource availability.
Modular Test Configuration and Validation: Implemented a modular system using config.yml and parse_config.py for defining and parsing test cases, alongside enhanced pytest utilities (conftest.py, test_result.py) for comprehensive result comparison against gold standards.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/all-tests-metax.yml
- .github/workflows/functional-tests-metax.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive set of functional tests and a CI trigger mechanism for Metax devices. The changes are extensive, adding numerous configuration files, test scripts, and utility code. While the overall structure is a good starting point, I've identified several areas for improvement regarding robustness, maintainability, and correctness in the test scripts and configurations. My review includes suggestions to address brittle path constructions, anti-patterns like using sleep for synchronization, inconsistencies in configuration files, and opportunities for refactoring to reduce code duplication. Addressing these points will significantly improve the quality and reliability of the new testing framework.

gemini-code-assist · 2025-10-14T09:34:18Z

tests/Metax_C550/tests/functional_tests/test_utils/test_call.py

+
+
+@pytest.mark.usefixtures("test_path", "test_type", "test_task", "test_case")
+def test_equal(test_path, test_type, test_task, test_case, monkeypatch):


The test function test_equal is decorated with pytest.mark.usefixtures and accepts several fixtures (test_path, test_type, test_task, test_case), but these are not used within the function body. The test makes a request with a hardcoded URL and data, which makes it inflexible and its purpose unclear. The test should be refactored to utilize the provided fixtures to run a meaningful, dynamic test case.

gemini-code-assist · 2025-10-14T09:34:19Z

tests/Metax_C550/tests/scripts/functional_tests/parse_config.py

+
+    try:
+        result = parse_config(args.config, args.type, args.task)
+        print(result)


The current method for parsing test cases from config.yml is fragile. It relies on the shell script to parse the string representation of a Python list. A more robust approach would be to:

Use a standard list format in config.yml (e.g., - 7b-tp2).

Modify this Python script to print the list elements joined by a space.
This will make the configuration clearer and the parsing logic in the shell script more reliable.

Suggested change

print(result)

result = parse_config(args.config, args.type, args.task)

if isinstance(result, list):

print(" ".join(result))

else:

print(result)

gemini-code-assist · 2025-10-14T09:34:19Z

tests/Metax_C550/tests/scripts/functional_tests/test_task.sh

+      fi
+
+      if [ "${_type}" = "inference" ]; then
+        run_command "python $FLAG_DIR/run.py --config-path tests/${PWD##*/}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case


The path construction tests/${PWD##*/}/... is brittle as it depends on the name of the current working directory. This can cause tests to fail when run from different locations. You should use the ${device} variable, which is reliably determined by the get_device_type function, to ensure the path is always correct.

Suggested change

run_command "python $FLAG_DIR/run.py --config-path tests/${PWD##*/}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case

run_command "python $FLAG_DIR/run.py --config-path tests/${device}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case

gemini-code-assist · 2025-10-14T09:34:19Z

tests/Metax_C550/tests/scripts/functional_tests/test_task.sh

+      fi
+
+      # Ensure that pytest check is completed before deleting the folder
+      sleep 10s


Using sleep to wait for an operation to complete is an anti-pattern that can lead to either flaky tests (if the sleep is too short) or inefficient execution (if it's too long). The script should implement a reliable synchronization mechanism to ensure that the test results are fully written before pytest is executed. The run_command function appears synchronous, so if run.py spawns a background process, the script must explicitly wait for it.

gemini-code-assist · 2025-10-14T09:34:19Z

...sts/functional_tests/test_cases/inference/deepseek_r1_distill_qwen-flaggems/conf/7b-tp2.yaml

+    CUDNN_BENCHMARK: "false"
+    CUDNN_DETERMINISTIC: "true"
+    USE_FLAGGEMS: "true"


There is an inconsistent use of boolean values for environment variables in this and other YAML configuration files. Some are defined as strings (e.g., "true"), while others use YAML's native boolean type (e.g., true on line 42). For consistency and to prevent potential parsing issues, it's best to use native YAML booleans throughout.

CUDNN_BENCHMARK: false CUDNN_DETERMINISTIC: true USE_FLAGGEMS: true

gemini-code-assist · 2025-10-14T09:34:20Z

tests/Metax_C550/tests/functional_tests/test_utils/test_result.py

+
+    assert os.path.exists(result_path), f"Failed to find 'stdout.log' at {result_path}"
+
+    with open(result_path, "r") as file:


Opening files without specifying an encoding can lead to inconsistent behavior across different operating systems or locales, and may cause a UnicodeDecodeError if the file contains non-ASCII characters. It's a best practice to always specify the encoding. This applies to all open() calls throughout this file.

Suggested change

with open(result_path, "r") as file:

with open(result_path, "r", encoding="utf-8") as file:

gemini-code-assist · 2025-10-14T09:34:21Z

tests/Metax_C550/tests/functional_tests/test_utils/test_result.py

+def test_inference_equal(test_path, test_type, test_task, test_case):
+    # Construct the test_result_path using the provided fixtures
+    test_result_path = os.path.join(test_path, test_type, test_task, "results_test", test_case)
+    result_path = os.path.join(test_result_path, "inference_logs/host_0_localhost.output")
+
+    print("result_path:", result_path)
+
+    assert os.path.exists(result_path), f"Failed to find 'host_0_localhost.output' at {result_path}"
+
+    with open(result_path, "r") as file:
+        lines = file.readlines()
+
+    result_lines = []
+    output = False
+    for line in lines:
+        assert "Failed to import 'flag_gems'" not in line, "Failed to import 'flag_gems''"
+        if line == "**************************************************\n":
+            output = True
+        if line == "##################################################\n":
+            output = False
+        if output == True:
+            result_lines.append(line)
+
+    gold_value_path = os.path.join(test_path, test_type, test_task, "results_gold", test_case)
+    assert os.path.exists(gold_value_path), f"Failed to find gold result at {gold_value_path}"
+
+    with open(gold_value_path, "r") as file:
+        gold_value_lines = file.readlines()
+
+    # Remove the blank line at the end.
+    if gold_value_lines:
+        last_non_empty = len(gold_value_lines) - 1
+        while last_non_empty >= 0 and not gold_value_lines[last_non_empty].strip():
+            last_non_empty -= 1
+        if last_non_empty >= 0:
+            gold_value_lines = gold_value_lines[: last_non_empty + 1]
+        else:
+            gold_value_lines = []
+
+    print("\nResult checking")
+    print("Result: ", result_lines)
+    print("Gold Result: ", gold_value_lines)
+
+    print("len(result_lines), (gold_value_lines): ", len(result_lines), len(gold_value_lines))
+    assert len(result_lines) == len(gold_value_lines)
+
+    for result_line, gold_value_line in zip(result_lines, gold_value_lines):
+        print(result_line, gold_value_line)
+        assert result_line.rstrip('\n') == gold_value_line.rstrip('\n')
+
+
+@pytest.mark.usefixtures("test_path", "test_type", "test_task", "test_case")
+def test_inference_pipeline(test_path, test_type, test_task, test_case):


The functions test_inference_equal and test_inference_pipeline contain a significant amount of duplicated code for locating and reading the inference output file. This redundancy makes the code harder to maintain. This common logic should be extracted into a shared helper function to improve modularity and reduce code duplication.

gemini-code-assist · 2025-10-14T09:34:21Z

tests/Metax_C550/tests/functional_tests/test_utils/test_result.py

+
+        # Check the next three lines for equality before the '=' character
+        for j in range(1, 4):
+            result_parts = result_group[j].split('=')


Using split('=') to parse key-value pairs from the log file is not robust, as it will fail if the value itself contains an = character. To handle such cases correctly, you should split only on the first occurrence of the delimiter.

Suggested change

result_parts = result_group[j].split('=')

result_parts = result_group[j].split('=', 1)

gemini-code-assist · 2025-10-14T09:34:21Z

tests/Metax_C550/tests/scripts/_gpu_check.sh

+
+        # Wait and show current status
+        echo "Waiting for Metax GPU memory usage to drop below 50% (current max usage: ${max_usage_percent}%)"
+        sleep 1m


A 1m sleep interval between GPU availability checks can be too long and may unnecessarily slow down the CI/CD pipeline. Consider reducing this to a shorter duration (e.g., 15s) to make the script more responsive to changes in GPU status.

Suggested change

sleep 1m

sleep 15s

gemini-code-assist · 2025-10-14T09:34:21Z

tests/Metax_C550/tests/scripts/functional_tests/test_all.sh

+
+for cmd in "${commands[@]}"; do
+    # Execute the command
+    $cmd


Executing commands directly from a variable ($cmd) can be unsafe and lead to issues with word splitting if arguments contain spaces or special characters. While the current commands are simple, using eval provides a safer and more robust way to execute commands stored in strings.

Suggested change

$cmd

eval "$cmd"

gemini-code-assist · 2025-10-14T09:34:28Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

…lagScale into refactor_cicd251013

Refactoring the CI trigger mechanism for Metax

75bddd5

zihugithub requested a review from a team as a code owner October 14, 2025 09:28

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

zihugithub and others added 7 commits October 16, 2025 09:49

Merge branch 'main' into refactor_cicd251013

40e4fb3

Merge branch 'main' into refactor_cicd251013

8547b34

Set the permission for the *.sh file to 755

11f8ce4

Merge branch 'main' into refactor_cicd251013

806657e

Merge branch 'main' into refactor_cicd251013

e3d4e29

debug1

9f99ee7

Merge branch 'refactor_cicd251013' of https://github.com/zihugithub/F…

376fa66

…lagScale into refactor_cicd251013



		@pytest.mark.usefixtures("test_path", "test_type", "test_task", "test_case")
		def test_equal(test_path, test_type, test_task, test_case, monkeypatch):

	run_command "python $FLAG_DIR/run.py --config-path tests/${PWD##*/}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case
	run_command "python $FLAG_DIR/run.py --config-path tests/${device}/tests/functional_tests/test_cases/${_type}/${_task}/conf --config-name ${_case} action=test" $attempt_i $_task $_type $_case


		assert os.path.exists(result_path), f"Failed to find 'stdout.log' at {result_path}"

		with open(result_path, "r") as file:

	with open(result_path, "r") as file:
	with open(result_path, "r", encoding="utf-8") as file:

	result_parts = result_group[j].split('=')
	result_parts = result_group[j].split('=', 1)

Uh oh!

[CICD] Refactoring the CI trigger mechanism for Metax #863

Are you sure you want to change the base?

[CICD] Refactoring the CI trigger mechanism for Metax #863

Uh oh!

Conversation

zihugithub commented Oct 14, 2025

Uh oh!

gemini-code-assist bot commented Oct 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant