New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Philip's blog #41

Open

p208p2002 opened this issue May 9, 2024 · 0 comments

Labels

Gitalk moe-routing-design

Owner

p208p2002 commented May 9, 2024

https://blog.philip-huang.tech/?page=moe-routing-design

論文連結：Switch Transformers

MoE 的特色在於它會將輸入導向對應的”專家“，這項機制前提是路由如何被訓練出來的，從預訓練上想要事先為每一筆訓練資料標註對應的類別幾乎不太可能；如果說要讓路由自動從資料中學習，那要如何避免“贏者全拿”產生的不平衡性問題？

將 Transformer 中的 FFN 層替換為稀疏的 Switch FFN 層（淺藍色）。

由上圖 Switch Transformers 的架構可知， MoE 的路由分發是 token-level 而非 sentence-level 或 document-level；這可能會是一般常見的誤解。

理解稀疏路由
MoE Routing
MoE 層將 token repres

p208p2002 added Gitalk moe-routing-design labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment