Alibaba's 72B Qwen 2.5 flagship instruct model — top-tier open-source performance on coding, math, and instruction-following.
JoinAI_V0.1.7 (GLM4.5/Qwen2.5VL/GLM4.5V/DeepSeekV3)
OpenHands + Qwen-2.5-72B — 5.7% resolved
Qwen2.5-72B-Instruct — rank #6, IFEval=86.38 BBH=61.87 MATH=59.82 GPQA=16.67 MUSR=11.74 MMLU-Pro=51.40