Documents the CI/CD workflows used by NVIDIA's Megatron-LM project. The skill explains the main GitHub Actions workflow structure, the decision tree for PR labels that control test scopes and repeat counts, how images are pushed to registries, and practical commands for triggering internal CI and locating pipeline logs. It also provides procedures for investigating failures and correlating them back to PR changes.
Use when debugging failed CI runs, deciding which PR labels to attach to open a PR, triggering internal pipelines without UI access, or understanding how CI stages map to test scope and container images. Ideal for maintainers, CI engineers, and contributors working on model training code where test scope selection matters.
tools/trigger_internal_ci.py and usage examples for triggering internal GitLab CI.gh to view PR metadata and runs, and guidance for locating and reading CI artifacts and logs.Inferred: developer-facing agents with shell access and GitHub CLI capability (gh, bash), and agents used by maintainers for CI automation.
NVIDIA Megatron-LM CI/CD reference guide skill. The SKILL.md body was null in the DB and the source URL returns 404 — the path 'skills/Megatron-Core/cicd/SKILL.md' does not exist in the nvidia/skills repo. No scripts were bundled. Unable to audit actual content; scored based on metadata alone.
Skill body unavailable (null in DB, 404 on GitHub). The nvidia/skills repo does not contain the path 'skills/Megatron-Core/cicd/SKILL.md'. Consider marking this skill as broken/unreachable or rescraping from the correct path.
MoE Expert-Parallel Overlap (Megatron-Bridge)
Guidance and configs to enable expert-parallel communication overlap in Megatron-Bridge for MoE models — use to hide dispatch/combine latency and improve throug
Video Analytics (VA-MCP)
Query incidents, alerts, sensor counts and metrics from a VA-MCP Elasticsearch backend (port 9901) to answer questions about violations, occupancy, speeds, and
VSS Video Summarization
Summarize recorded video clips using a local LVS summarization microservice with HITL; fallback to a VLM when the service is unavailable.