
"The first release, MAI-Voice-1, is a speech generation model capable of producing a full minute of high-quality audio in under a second on a single GPU. Microsoft describes it as one of the most efficient systems available today. MAI-Voice-1 is already embedded within Copilot Daily and Podcasts and is now accessible through Copilot Labs. Users can try features such as storytelling demos and guided meditations, highlighting the model's ability to generate expressive and natural-sounding audio across both single and multi-speaker scenarios."
"The second release, MAI-1-preview, marks MAI's first internally developed foundation model. Built with a mixture-of-experts architecture and trained on approximately 15,000 NVIDIA H100 GPUs, it is optimized for instruction following and everyday query responses. MAI-1-preview is currently undergoing evaluation on LMArena, a community-driven platform for model benchmarking and comparison. Microsoft also plans to roll it out for selected text use cases in Copilot, gathering user feedback to refine performance."
Microsoft released two in-house models, MAI-Voice-1 and MAI-1-preview, to expand MAI's purpose-built systems. MAI-Voice-1 generates a full minute of high-quality audio in under a second on a single GPU and is embedded in Copilot Daily and Podcasts with access through Copilot Labs for storytelling and guided meditation demos across single and multi-speaker scenarios. MAI-1-preview is an internally developed foundation model using a mixture-of-experts architecture trained on approximately 15,000 NVIDIA H100 GPUs, optimized for instruction following and everyday queries, currently evaluated on LMArena and rolling out to selected Copilot text use cases with API access for trusted testers.
Read at Medium
Unable to calculate read time
Collection
[
|
...
]