SKILL.md packages that extend Claude Code, Cursor, Copilot, and other AI agents.
Tags

claude-skill-registry
Guidance and patterns for Python parallelism and GPU/CPU performance: threading vs multiprocessing vs asyncio, CUDA streams, PyTorch DDP, and benchmarking.

megatron-bridge
Guides enabling and validating MoE expert-parallel communication overlap in Megatron-Bridge to hide dispatch/combine latency and improve throughput.

ako4all
Automated loop that profiles, iterates, benchmarks and commits GPU kernel optimizations across CUDA/Triton/TileLang/Python/C++ to achieve measurable speedups.

graphsignal-profiler
Set up and manage GPU profiling, tracing, and monitoring for AI inference workloads using vLLM, SGLang, and PyTorch.