Skip to content

Conversation

SlightwindSec
Copy link
Contributor

Description

This PR fixes the dependency on a POC version of torch_npu for the MoE routing initialization feature.

Before:
To get the best performance, users needed a torch_npu version containing the npu_moe_init_routing_quant operator. Official versions would trigger a slower, pure PyTorch fallback.

After:
The code is updated to use the npu_moe_init_routing_v2 operator, which is included in the official torch_npu releases and provides equivalent performance. This change unifies the implementation, removes the fallback logic, and makes the high-performance path accessible to all users without requiring a special library version.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MoE routing initialization for W8A8 dynamic quantization. It replaces the dependency on a proof-of-concept torch_npu operator, npu_moe_init_routing_quant, with the official npu_moe_init_routing_v2 operator. This change successfully unifies the implementation by removing the conditional logic and the pure PyTorch fallback path, which simplifies the code and improves maintainability. The update appears correct and aligns with the goal of using standardized, performant operators from the official torch_npu library.

Signed-off-by: SlightwindSec <[email protected]>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 16, 2025
@Yikun
Copy link
Collaborator

Yikun commented Aug 28, 2025

v0.9.1-dev are code freezing, can you make sure is this still needed? or just move to main branch. Thanks.

@SlightwindSec
Copy link
Contributor Author

v0.9.1-dev are code freezing, can you make sure is this still needed? or just move to main branch. Thanks.

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation module:quantization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants