On Wednesday, Microsoft Research unveiled Magma, an innovative AI model that merges visual and language processing to manage software interfaces and robotics. It aims to advance multimodal AI capabilities, functioning interactively in both real and digital realms. Magma distinguishes itself as the first model to natively act on multimodal data, integrating perception and control within one framework. This step towards agentic AI allows it to autonomously craft plans and execute tasks, positioning it alongside other projects in the AI agent landscape like OpenAI's endeavors. Its collaborative development involved prominent institutions, suggesting a broad push for sophisticated AI applications.
Given a described goal, Magma is able to formulate plans and execute actions to achieve it. By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal, spatial, and temporal intelligence to navigate complex tasks and settings.
Unlike many prior multimodal AI systems that require separate models for perception and control, Magma integrates these abilities into a single foundation model.
Collection
[
|
...
]