
from skills73
Guidance and configs to enable expert-parallel communication overlap in Megatron-Bridge for MoE models — use to hide dispatch/combine latency and improve throug
Provides detailed guidance, example configs, and verification steps for enabling expert-parallel (EP) communication overlap in Megatron-Bridge. Describes dispatcher choices (alltoall vs flex), delayed weight-gradient computation, backend constraints (DeepEP/HybridEP), and minimal working configs to safely roll out overlap.
Use when running MoE models where expert dispatch/combine all-to-all communication is a measurable bottleneck and you have the memory and deployment constraints to tune for throughput. Avoid for tiny runs, early correctness bring-up, or incompatible PyTorch/TE/CUDA setups.
Engineers and agents with knowledge of deep-learning training infra (Megatron/Bridge) — useful for performance-tuning assistants and infra automation tools.
This skill has not been reviewed by our automated audit pipeline yet.