Agent Skills

SKILL.md packages that extend Claude Code, Cursor, Copilot, and other AI agents.

Trust Score Usefulness Recommended Stars Latest

Filters

OSMO CLI Agent

osmo

Manage OSMO cloud compute: check pools, GPUs, quotas, submit and monitor workflows, inspect logs, and create apps using the OSMO CLI.

cloudworkflowsgpu

143

7 triggers

ROCm Triton Kernels (RMSNorm, RoPE 3D, GEGLU, AdaLN)

kernels

Guidance and examples for writing, benchmarking, and integrating optimized Triton kernels on ROCm (MI355X, R9700) for diffusers and transformers workloads.

gpurocmtriton

599

7 triggers

cuEquivariance Torch (cuet)

cuequivariance

PyTorch GPU-accelerated equivariant tensor primitives and layers (SegmentedPolynomial, tensor products, spherical harmonics, equivariant linear layers) for buil

pytorchequivariancegpu

388

6 triggers

Terradev — Cross‑Cloud GPU Provisioning

Terradev

Provision and manage GPUs across clouds, create GPU Kubernetes clusters, deploy inference endpoints, and burst local compute to cloud with BYOAPI credential saf

gpuprovisioningkubernetes

6 triggers

SparkJS - 3D Gaussian Splat Renderer

sparkjs-skill

Tools and guidance for building, editing, and optimising 3D Gaussian splat scenes in the browser using SparkJS and Three.js.

threejswebglrendering

6 triggers

CV Model Training

cv-train-stack

Review, run, validate and audit computer vision model training with checks for dataset quality, preprocessing consistency, augmentation, and deployment validati

computer-visionmodel-trainingdataset-audit

7 triggers

vLLM-Omni Video Generation

vllm-omni-skills

Generate videos (text→video, image→video, text+image→video) using vLLM-Omni and Wan2.2-style diffusion models, with guidance on parameters and performance trade

video-generationtext-to-videoimage-to-video

5 triggers

MaxText Performance Analysis

maxtext-slurm

A post-training analysis workflow that uses TGS tagging, TraceLens, and IRLens to diagnose model training performance, GPU utilization, and kernel-level hotspot

performanceprofilingtracing

6 triggers

Run LLMs Locally (Harbor)

harbor

Deploy, configure, and troubleshoot a full local LLM stack (Ollama, llama.cpp, vLLM, Open WebUI, SearXNG, Open Terminal) using the Harbor toolkit.

llmlocal-aidocker

2,955

7 triggers

Sub-Agents (Parallel Agent Orchestration)

modal-auto-research-skills

Orchestrate multiple autonomous Claude Code agents across separate GPUs or sandboxes to run parallel experiments, debugging sessions, or batch workloads with st

orchestrationagentsdistributed

7 triggers

NVIDIA NIM Model Caller

openclaw-nim-skill

Call NVIDIA NIM-hosted LLMs from OpenClaw to offload heavy model work and conserve main-agent tokens.

nvidianimllm

7 triggers

HPC Python Patterns

claude-skill-registry

Guidance and patterns for Python parallelism and GPU/CPU performance: threading vs multiprocessing vs asyncio, CUDA streams, PyTorch DDP, and benchmarking.

pythonhpcgpu

466

7 triggers

MoE Expert-Parallel Overlap (Megatron-Bridge)

skills

Guidance and configs to enable expert-parallel communication overlap in Megatron-Bridge for MoE models — use to hide dispatch/combine latency and improve throug

moemegatronperformance

1,118

8 triggers

MoE Expert-Parallel Overlap

megatron-bridge

Guides enabling and validating MoE expert-parallel communication overlap in Megatron-Bridge to hide dispatch/combine latency and improve throughput.

moeperformancegpu

637

6 triggers

dbg — Debug & Profiling CLI

dbg

Persistent CLI for debugging, profiling, and JIT disassembly across languages and backends; captures hits for diffing and trend analysis.

debuggingprofilingcli

11 triggers

AKO4ALL — Agentic Kernel Optimization

ako4all

Automated loop that profiles, iterates, benchmarks and commits GPU kernel optimizations across CUDA/Triton/TileLang/Python/C++ to achieve measurable speedups.

gpuperformancebenchmarking

262

7 triggers

Litmus — Parallel ML Research

litmus

Orchestrates parallel autonomous ML research agents with git-backed experiment branches, a director/synthesizer layer, and nightly synthesised findings.

mlresearchautonomous-agents

5 triggers

MotionGPU Core & Adapters

motion-core

Build and edit MotionGPU code across core and Svelte/React/Vue adapters.

webgpuwgslreact

9 triggers