Voice Conversion Audio Samples
Dataset: CSTR VCTK dataset [1]
Note: the parallel samples shown here are just for easier comparison. In training, we did not require parallel corpus and we did not use parallel corpus.
Systems:
- Baseline: PPG2speech synthesizer baseline.
- Proposed: PPG2speech synthesizer with adversarial speaker classifier.
Inset testing (test speakers were known during training)
Source | Target | Baseline | Proposed | |
---|---|---|---|---|
Female → Female > | ||||
Female → Male > | ||||
Male → Female > | ||||
Male → Male > |
One-shot testing (test speakers were unseen during training)
Source | Target | Baseline | Proposed | |
---|---|---|---|---|
Female → Female > | ||||
Female → Male > | ||||
Male → Female > | ||||
Male → Male > |
References
[1] Veaux, Christophe, Junichi Yamagishi, and Kirsten MacDonald. "SUPERSEDED-CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit." (2016).