We utilized the LibriTTS dataset to train the hierarchical speech synthesizer, specifically using the train-clean subsets for fair model comparisons while enhancing voice style transfer.
Our model was trained at a scale of 1k to improve robustness and diversity, employing Libri-light and Multi-Speaker Speech Synthesis datasets to sample diverse speaker data.
Collection
[
|
...
]