What it does

Provides detailed guidance, example configs, and verification steps for enabling expert-parallel (EP) communication overlap in Megatron-Bridge. Describes dispatcher choices (alltoall vs flex), delayed weight-gradient computation, backend constraints (DeepEP/HybridEP), and minimal working configs to safely roll out overlap.

When to use it

Use when running MoE models where expert dispatch/combine all-to-all communication is a measurable bottleneck and you have the memory and deployment constraints to tune for throughput. Avoid for tiny runs, early correctness bring-up, or incompatible PyTorch/TE/CUDA setups.

What's included

Scripts: none in this skill package, but it references performance harness commands and unit tests in the repo.
References: structured docs and a card.yaml with metadata.
Instructions: config snippets, compatibility matrix, verification steps (unit tests, log checks), minimal runnable commands and failure diagnosis.

Compatible agents

Engineers and agents with knowledge of deep-learning training infra (Megatron/Bridge) — useful for performance-tuning assistants and infra automation tools.

MoE Expert-Parallel Overlap (Megatron-Bridge)

What it does

When to use it

What's included

Compatible agents

Tags

Not yet audited

Information

Related Skills