Skip to content

Conversation

mitruska
Copy link
Contributor

@mitruska mitruska commented Sep 30, 2025

Details:

  • Specification of MOE internal operation
  • Internal ops are used mainly for fusion transformations and optimizations,
    they will not appear in the converted model public IR

Describes MOE used in PR:

Tickets:

  • 171911

@mitruska mitruska requested a review from a team as a code owner September 30, 2025 09:33
@mitruska mitruska requested review from zKulesza and removed request for a team September 30, 2025 09:33
@github-actions github-actions bot added the category: docs OpenVINO documentation label Sep 30, 2025
@mitruska mitruska self-assigned this Sep 30, 2025
Comment on lines 63 to 68
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU plugin request is to transpose those weights at conversion stage, so the MatMul both transpose_a/b attrs should be False at this point:

Suggested change
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True)
# Experts computation part (GEMM3_SWIGLU)
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=False)
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=False)
swiglu = swish(x_proj, beta=expert_beta)
x_proj = x_proj2 * swiglu
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=False)

cc: @yeonbok

@@ -0,0 +1,151 @@
.. {#openvino_docs_ops_internal_MOE}

MOE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let us not use MoE name because we can use it for external operation and for real MoE operation. Now it is a sort of FusedExperts.

.. code-block:: py
:force:

# Common part: Reshape hidden states and prepare for expert computation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add router_topk_output_indices into this logic. It will show how weights are prepared. Now it is not clear how router_topk_output_indices is used in the specified operation.

Copy link
Collaborator

@rkazants rkazants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Thank you, Kasia. Left a couple of comments,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants