Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts Paper • 2306.04845 • Published Jun 8, 2023 • 4
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks Paper • 2306.04073 • Published Jun 7, 2023 • 2
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39