SKILL.md packages that extend Claude Code, Cursor, Copilot, and other AI agents.
Tags

osmo
Manage OSMO cloud compute: check pools, GPUs, quotas, submit and monitor workflows, inspect logs, and create apps using the OSMO CLI.

kernels
Guidance and examples for writing, benchmarking, and integrating optimized Triton kernels on ROCm (MI355X, R9700) for diffusers and transformers workloads.

cuequivariance
PyTorch GPU-accelerated equivariant tensor primitives and layers (SegmentedPolynomial, tensor products, spherical harmonics, equivariant linear layers) for buil

Terradev
Provision and manage GPUs across clouds, create GPU Kubernetes clusters, deploy inference endpoints, and burst local compute to cloud with BYOAPI credential saf

sparkjs-skill
Tools and guidance for building, editing, and optimising 3D Gaussian splat scenes in the browser using SparkJS and Three.js.

vllm-omni-skills
Generate videos (text→video, image→video, text+image→video) using vLLM-Omni and Wan2.2-style diffusion models, with guidance on parameters and performance trade

maxtext-slurm
A post-training analysis workflow that uses TGS tagging, TraceLens, and IRLens to diagnose model training performance, GPU utilization, and kernel-level hotspot

modal-auto-research-skills
Orchestrate multiple autonomous Claude Code agents across separate GPUs or sandboxes to run parallel experiments, debugging sessions, or batch workloads with st

skills
Guidance and configs to enable expert-parallel communication overlap in Megatron-Bridge for MoE models — use to hide dispatch/combine latency and improve throug

cv-train-stack
Review, run, validate and audit computer vision model training with checks for dataset quality, preprocessing consistency, augmentation, and deployment validati

openclaw-nim-skill
Call NVIDIA NIM-hosted LLMs from OpenClaw to offload heavy model work and conserve main-agent tokens.

megatron-bridge
Guides enabling and validating MoE expert-parallel communication overlap in Megatron-Bridge to hide dispatch/combine latency and improve throughput.

claude-skill-registry
Guidance and patterns for Python parallelism and GPU/CPU performance: threading vs multiprocessing vs asyncio, CUDA streams, PyTorch DDP, and benchmarking.

harbor
Deploy, configure, and troubleshoot a full local LLM stack (Ollama, llama.cpp, vLLM, Open WebUI, SearXNG, Open Terminal) using the Harbor toolkit.