Microsoft · Phi 3
Microsoft's 4.2B multimodal Phi-3 model — combines vision and language with 128K context for efficient on-device image understanding.
No benchmark scores available yet for this model.