Zyphra, an AI startup based in Palo Alto, launched Zonos, two text-to-speech models capable of effectively cloning voices with minimal input. The models utilize transformer and hybrid architectures, trained on over 200,000 hours of diverse speech data, primarily English, but also include other languages. Notably, Zyphra releases the model weights under an Apache 2.0 license on Hugging Face, differentiating itself from competitors like ElevenLabs. Users can test these models in a demo environment or via their API services, although privacy concerns about voice uploads are acknowledged.
Hands on Palo Alto-based AI startup Zyphra unveiled a pair of open text-to-speech (TTS) models this week said to be capable of cloning your voice with as little as five seconds of sample audio.
To date, these efforts have seen the release of its Zamba family of small language models, optimizations such as tree attention, and now the release of its Zonos TTS models.
Collection
[
|
...
]