
EMO: Pretraining mixture of experts for emergent modularity
Introduction of EMO, a new approach to pretraining Mixture of Experts (MoE) that promotes emergent modularity. This research aims to improve how models specialize and efficiently allocate parameters across different tasks.