
Speeding up GPU kernels by 38% with a multi-agent system
Cursor demonstrated a multi-agent system that autonomously optimized 235 CUDA kernels for NVIDIA Blackwell 200 GPUs. The approach achieved a 38% geomean speedup over baselines in just three weeks, showcasing the power of agentic optimization for low-level performance.


