Voice Conversion Audio Samples
Dataset: CSTR VCTK dataset [1]
Note: the parallel samples shown here are just for easier comparison. In training, we did not require parallel corpus and we did not use parallel corpus.
Systems:
- Baseline: PPG2speech synthesizer baseline.
- Proposed: PPG2speech synthesizer with adversarial speaker classifier.
Inset testing (test speakers were known during training)
| Source | Target | Baseline | Proposed | |
|---|---|---|---|---|
| Female → Female > | ||||
| Female → Male > | ||||
| Male → Female > | ||||
| Male → Male > |
One-shot testing (test speakers were unseen during training)
| Source | Target | Baseline | Proposed | |
|---|---|---|---|---|
| Female → Female > | ||||
| Female → Male > | ||||
| Male → Female > | ||||
| Male → Male > |
References
[1] Veaux, Christophe, Junichi Yamagishi, and Kirsten MacDonald. "SUPERSEDED-CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit." (2016).