Back to Llama 3.2
Meta's 11B multimodal Llama 3.2 — the first vision-capable Llama model, supporting image understanding with 128K context.
131K tokensFree / Open weightsimageTransformerLlama 3.2 Community
No benchmark scores available yet for this model.