Naturalspeech: End-to-end text-to-speech synthesis with human-level quality X Tan, J Chen, H Liu, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, L He, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 109 | 2024 |
Visinger: Variational inference with adversarial learning for end-to-end singing voice synthesis Y Zhang, J Cong, H Xue, L Xie, P Zhu, M Bi ICASSP 2022, 2022 | 53 | 2022 |
Data efficient voice cloning from noisy samples with domain adversarial training J Cong, S Yang, L Xie, G Yu, G Wan INTERSPEECH 2020, 2020 | 32 | 2020 |
Controllable Context-aware Conversational Speech Synthesis J Cong, S Yang, N Hu, G Li, L Xie, D Su INTERSPEECH 2021, 2021 | 29 | 2021 |
Glow-wavegan: Learning speech representations from gan-based variational auto-encoder for high fidelity flow-based speech synthesis J Cong, S Yang, L Xie, D Su INTERSPEECH 2021, 2021 | 25 | 2021 |
Glow-WaveGAN 2: high-quality zero-shot text-to-speech synthesis and any-to-any voice conversion Y Lei, S Yang, J Cong, L Xie, D Su INTERSPEECH2022, 2022 | 12 | 2022 |
Dspgan: a gan-based universal vocoder for high-fidelity tts by time-frequency domain supervision from dsp K Song, Y Zhang, Y Lei, J Cong, H Li, L Xie, G He, J Bai ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 8 | 2023 |
DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin T Li, C Hu, J Cong, X Zhu, J Li, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 3 | 2023 |
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS K Song, J Cong, X Wang, Y Zhang, L Xie, N Jiang, H Wu 2022 13th International Symposium on Chinese Spoken Language Processing …, 2022 | 1 | 2022 |
AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation K Song, H Xue, X Wang, J Cong, Y Zhang, L Xie, B Yang, X Zhang, D Su 2022 13th International Symposium on Chinese Spoken Language Processing …, 2022 | 1 | 2022 |
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning T Li, Z Wang, X Zhu, J Cong, Q Tian, Y Wang, L Xie arXiv preprint arXiv:2310.04004, 2023 | | 2023 |