Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna

Department of Computer Science and Engineering, Texas A&M University, USA

Voice Conversion Audio Samples

Dataset: CSTR VCTK dataset [1]

Note: the parallel samples shown here are just for easier comparison. In training, we did not require parallel corpus and we did not use parallel corpus.


Systems:

Inset testing (test speakers were known during training)

Source Target Baseline Proposed
Female → Female
Female → Male
Male → Female
Male → Male

One-shot testing (test speakers were unseen during training)

Source Target Baseline Proposed
Female → Female
Female → Male
Male → Female
Male → Male

References

[1] Veaux, Christophe, Junichi Yamagishi, and Kirsten MacDonald. "SUPERSEDED-CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit." (2016).