Meta's 70B Llama 3.3 instruction model — near-405B quality at a fraction of the inference cost.
🦤 AWorld (inclusionAI)
Llama-3.3-70B-Instruct FC — rank #62
OpenHands + Llama-3.3-70B — 6.9% resolved
Llama-3.3-70B-Instruct — rank #40, IFEval=89.98 BBH=56.56 MATH=48.34 GPQA=10.51 MUSR=15.57 MMLU-Pro=48.13