Skip to content

MoE example #91

@dmccloskey

Description

@dmccloskey

References:

oam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,
and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.
arXiv preprint arXiv:1701.06538, 2017.

Objectives:

ModelBuilder::addMoE

  • Noisy Top K Gate
  • Experts
    LossFunction
  • Evenness = w importance * CV(Importance(x))^2
  • load balance
  • diversity

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions