feat:suppot infer slice tensor at dim > 0 and optimize memory by PrometheusComing · Pull Request #106 · inclusionAI/asystem-awex

PrometheusComing · 2026-04-09T06:57:21Z

What does this PR do?

support infer slice tensor at dim > 0 which could make p2p tensor not continuous in NCCL or HCCL and optimize memory using

1、support infer tensor slice at dim >0 such as attention.dense.weight when infer's tp < train's tp which could make p2p tensor not continuous in NCCL or HCCL. another scene is the experts in VLLM-ascend has been transposed but shared_experts not.

2、optimize memory by supporting p2p send receive one by one, it's useful for debug or insufficient memory if infer with closed sleep mode and it's also useful for the hardware diff scene which using batch_send_recv will be error,such as 910B2 and 910B1. p2p send receive one by one comparing with batch send receive, can increase time consumption less than 10% but decrease peak memory 25% for qwen3-30B

3、optimize memory by using local part process group and destroy weight exchange process group in NPU HCCL, because the HCCL_BUFFERSIZE actually occupies twice the memory, and it also requires HCCL_BUFFERSIZE keep consistency at infer and train. Using local part process group can optimize this portion of the memory.

Related issues

No

Does this PR introduce any user-facing change?

No

Does this PR introduce any public API change? NO
Does this PR introduce any binary protocol compatibility change? NO

gemini-code-assist

Code Review

This pull request refactors the NCCL and HCCL weight transfer logic to support non-contiguous tensors and introduces a one-by-one communication mode for debugging and memory efficiency. It also updates the SGLang converter to handle shared experts in MoE models and adds configuration for HCCL buffer sizes. Review feedback suggests removing redundant del statements for local list references to improve code clarity.

gemini-code-assist · 2026-04-09T07:08:07Z

+            non_contiguous_tensor_pairs.clear()
+            del non_contiguous_tensor_pairs


The del non_contiguous_tensor_pairs statement is redundant. The list is cleared on the previous line, and the local reference will be garbage collected when the function returns. Removing this line will make the code cleaner without affecting functionality.

Suggested change

non_contiguous_tensor_pairs.clear()

del non_contiguous_tensor_pairs

non_contiguous_tensor_pairs.clear()

gemini-code-assist · 2026-04-09T07:08:07Z

+                non_contiguous_tensor_pairs.clear()
+                del non_contiguous_tensor_pairs


The del non_contiguous_tensor_pairs statement is redundant here. The list is cleared on the previous line, and the local reference will be garbage collected automatically. Removing this line would improve code clarity.

Suggested change

non_contiguous_tensor_pairs.clear()

del non_contiguous_tensor_pairs

non_contiguous_tensor_pairs.clear()

chaokunyang

LGTM

feat:suppot infer slice tensor at dim > 0 and optimize memory

7c120c9

PrometheusComing requested a review from chaokunyang as a code owner April 9, 2026 06:57

gemini-code-assist Bot reviewed Apr 9, 2026

View reviewed changes

chaokunyang approved these changes Apr 13, 2026

View reviewed changes

chaokunyang merged commit 7739878 into inclusionAI:main Apr 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:suppot infer slice tensor at dim > 0 and optimize memory#106

feat:suppot infer slice tensor at dim > 0 and optimize memory#106
chaokunyang merged 1 commit intoinclusionAI:mainfrom
PrometheusComing:xxj_hccl

PrometheusComing commented Apr 9, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Uh oh!

chaokunyang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		non_contiguous_tensor_pairs.clear()
		del non_contiguous_tensor_pairs

Conversation

PrometheusComing commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related issues

Does this PR introduce any user-facing change?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

chaokunyang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PrometheusComing commented Apr 9, 2026 •

edited

Loading