-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Spec][MOE][Internal Op] Specification of MOE internal operation #32255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[Spec][MOE][Internal Op] Specification of MOE internal operation #32255
Conversation
# Experts computation part (GEMM3_SWIGLU) | ||
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True) | ||
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True) | ||
swiglu = swish(x_proj, beta=expert_beta) | ||
x_proj = x_proj2 * swiglu | ||
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU plugin request is to transpose those weights at conversion stage, so the MatMul
both transpose_a/b
attrs should be False
at this point:
# Experts computation part (GEMM3_SWIGLU) | |
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=True) | |
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=True) | |
swiglu = swish(x_proj, beta=expert_beta) | |
x_proj = x_proj2 * swiglu | |
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=True) | |
# Experts computation part (GEMM3_SWIGLU) | |
x_proj = matmul(reshaped_hidden_states, weight_0, transpose_a=False, transpose_b=False) | |
x_proj2 = matmul(reshaped_hidden_states, weight_1, transpose_a=False, transpose_b=False) | |
swiglu = swish(x_proj, beta=expert_beta) | |
x_proj = x_proj2 * swiglu | |
down_proj = matmul(x_proj, weight_2, transpose_a=False, transpose_b=False) |
cc: @yeonbok
…ts/operation-specs/internal/moe.rst
@@ -0,0 +1,151 @@ | |||
.. {#openvino_docs_ops_internal_MOE} | |||
|
|||
MOE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us not use MoE name because we can use it for external operation and for real MoE operation. Now it is a sort of FusedExperts
.
.. code-block:: py | ||
:force: | ||
|
||
# Common part: Reshape hidden states and prepare for expert computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to add router_topk_output_indices
into this logic. It will show how weights are prepared. Now it is not clear how router_topk_output_indices
is used in the specified operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Thank you, Kasia. Left a couple of comments,
...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst
Outdated
Show resolved
Hide resolved
...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst
Outdated
Show resolved
Hide resolved
...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst
Outdated
Show resolved
Hide resolved
...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst
Show resolved
Hide resolved
...articles_en/documentation/openvino-ir-format/operation-sets/operation-specs/internal/moe.rst
Outdated
Show resolved
Hide resolved
Co-authored-by: Tatiana Savina <[email protected]>
Details:
they will not appear in the converted model public IR
Describes MOE used in PR:
Tickets: