The Limitations of HierSpeech++ and a Quick Fix

from Hackernoon 1 year ago

While our model significantly improves zero-shot speech synthesis performance, it inadvertently synthesizes background noise alongside the voice due to non-disentangled modeling.
Hackernoonhttps://hackernoon.com/the-limitations-of-hierspeech-and-a-quick-fix

To mitigate this, we employ a denoiser prior to the style encoder, which enhances audio quality but unfortunately reduces reconstruction quality metrics like CER and WER.
Hackernoonhttps://hackernoon.com/the-limitations-of-hierspeech-and-a-quick-fix

We discovered that the denoiser tends to remove critical speech elements, negatively affecting pronunciation in the synthetic output, necessitating further refinements.
Hackernoonhttps://hackernoon.com/the-limitations-of-hierspeech-and-a-quick-fix

To address the issues introduced by noise and denoising, we use an interpolation method between original and denoised style representations, offering improved results.
Hackernoonhttps://hackernoon.com/the-limitations-of-hierspeech-and-a-quick-fix

Read at Hackernoon

#speech-synthesis #machine-learning #noise-reduction #voice-conversion #text-to-speech

Collection

[

...

]

The Limitations of HierSpeech++ and a Quick Fix | HackerNoonThe Limitations of HierSpeech++ and a Quick Fix | HackerNoon Briefly

The Limitations of HierSpeech++ and a Quick Fix | HackerNoon
The Limitations of HierSpeech++ and a Quick Fix | HackerNoon
Briefly