Microsoft brings out a small language model that can look at picturesMicrosoft introduced Phi-3-vision, a multimodal language model focusing on both text and image comprehension, specifically for mobile devices.