From https://arxiv.org/pdf/2406.04692 section 2.3:
Each layer comprises a set of n expert networks alongside a gating network and includes residual connections for improved gradient flow.
In the repo root's README.md, the quickstart images:
Are missing the "skip connection" / "residual connection" where the input prompt is also passed into all layers of MoA.