Meta's 70B Llama 3.1 instruct model — the sweet spot of the 3.1 family, with 128K context and near-GPT-4-level performance.
AWorld (inclusionAI, GPT-4o + DeepSeek V3 + Claude + Gemini)
Llama3-chat-70b (WebArena Team, Apr 2024)
OpenHands + Llama-3.1-70B — 1.7% resolved
Llama-3-70B + Search AxTree+Caption (Jun 2024)
rank #62, IFEval=86.69 BBH=55.93 MATH=38.07 GPQA=14.21 MUSR=17.69 MMLU-Pro=47.88