Follow
Yuki Saito
Yuki Saito
Lecturer, The University of Tokyo
Verified email at ipc.i.u-tokyo.ac.jp - Homepage
Title
Cited by
Cited by
Year
Statistical parametric speech synthesis incorporating generative adversarial networks
Y Saito, S Takamichi, H Saruwatari
IEEE/ACM Transactions on Audio, Speech, and Language Processing 26 (1), 84-96, 2017
2612017
Non-parallel voice conversion using variational autoencoders conditioned by phonetic posteriorgrams and d-vectors
Y Saito, Y Ijima, K Nishida, S Takamichi
2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018
1402018
Voice conversion using sequence-to-sequence learning of context posterior probabilities
H Miyoshi, Y Saito, S Takamichi, H Saruwatari
arXiv preprint arXiv:1704.02360, 2017
662017
JVS corpus: free Japanese multi-speaker voice corpus
S Takamichi, K Mitsui, Y Saito, T Koriyama, N Tanji, H Saruwatari
arXiv preprint arXiv:1908.06248, 2019
632019
JSUT and JVS: Free Japanese voice corpora for accelerating speech synthesis research
S Takamichi, R Sonobe, K Mitsui, Y Saito, T Koriyama, N Tanji, ...
Acoustical Science and Technology 41 (5), 761-768, 2020
522020
Phase reconstruction from amplitude spectrograms based on von-Mises-distribution deep neural network
S Takamichi, Y Saito, N Takamune, D Kitamura, H Saruwatari
2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC …, 2018
462018
Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis
Y Saito, S Takamichi, H Saruwatari
2017 IEEE International Conference on Acoustics, Speech and Signal …, 2017
372017
Voice conversion using input-to-output highway networks
Y Saito, S Takamichi, H Saruwatari
IEICE Transactions on Information and Systems 100 (8), 1925-1928, 2017
322017
Text-to-speech synthesis using STFT spectra based on low-/multi-resolution generative adversarial networks
Y Saito, S Takamichi, H Saruwatari
2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018
292018
Phase reconstruction from amplitude spectrograms based on directional-statistics deep neural networks
S Takamichi, Y Saito, N Takamune, D Kitamura, H Saruwatari
Signal Processing 169, 107368, 2020
252020
Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space.
D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari
Interspeech, 2947-2951, 2020
202020
Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image.
S Goto, K Onishi, Y Saito, K Tachibana, K Mori
INTERSPEECH, 1321-1325, 2020
192020
Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU.
T Saeki, Y Saito, S Takamichi, H Saruwatari
INTERSPEECH, 1021-1022, 2020
142020
HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling
K Fujii, Y Saito, S Takamichi, Y Baba, H Saruwatari
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
142020
Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra
Y Saito, S Takamichi, H Saruwatari
Computer Speech & Language 58, 347-363, 2019
122019
DNN-based speaker embedding using subjective inter-speaker similarity for multi-speaker modeling in speech synthesis
Y Saito, S Takamichi, H Saruwatari
arXiv preprint arXiv:1907.08294, 2019
122019
Perceptual-similarity-aware deep speaker representation learning for multi-speaker generative modeling
Y Saito, S Takamichi, H Saruwatari
IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 1033-1048, 2021
112021
Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
D Xin, Y Saito, S Takamichi, T Koriyama, H Saruwatari
Interspeech, 1614-1618, 2021
112021
Generative moment matching network-based random modulation post-filter for DNN-based singing voice synthesis and neural double-tracking
H Tamaru, Y Saito, S Takamichi, T Koriyama, H Saruwatari
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
102019
Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
D Yang, T Koriyama, Y Saito, T Saeki, D Xin, H Saruwatari
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
92023
The system can't perform the operation now. Try again later.
Articles 1–20