-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
References:
oam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,
and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.
arXiv preprint arXiv:1701.06538, 2017.
Objectives:
ModelBuilder::addMoE
- Noisy Top K Gate
- Experts
LossFunction - Evenness = w importance * CV(Importance(x))^2
- load balance
- diversity
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels