FastDCTTS audio demos

Authors

Minsu Kang, Jihyun Lee, Simin Kim and Injung Kim (Handong Global University)

Abstract

We propose an end-to-end speech synthesizer, Fast DCTTS, that synthesizes speech in real time on a single CPU thread. The proposed model is composed of a carefully-tuned light-weight network designed by applying multiple network reduction and fidelity improvement techniques. In addition, we propose a novel Group highway activation that can compromise between computational efficiency and the regularization effect of the gating mechanism. As well, we introduce a new metric called Elastic Mel Cepstral Distortion(EMCD) to measure the fidelity of the output mel-spectrogram. In experiments, we analyze the effect of the acceleration techniques on speed and speech quality. Our best model maintains a speech quality similar to that of the baseline model, DCTTS, with the computation reduced to 1.76% and the number of parameters decreased to 2.75%. The speed on a single CPU thread was improved by 7.45 times, which is fast enough to produce mel-spectrogram in real time without GPU.


FastDCTTS vs. baseline DCTTS - LJSpeech dataset

No. baseline 90% FastDCTTS 90% baseline 70% FastDCTTS 70% Script
01    The most trifling acts were magnified into offenses.
02    One smug alderman, a member of the House of Commons, sneered at the ultra philanthropy of the champions of prison improvement.
03    First by daily services, the latter by the appointment of schoolmasters and instruction in reading and writing.
04    as is shown by the report of the Commisioners to inquire into the state of the municipal coporations in
05    It was understood also that he had served in the army as a private, and had, moreover, undergone a sentence of transportation.


FastDCTTS vs. baseline DCTTS - KSS dataset

No. baseline 90% FastDCTTS 90% baseline 70% FastDCTTS 70% Script
01    한국은 천연자원이 풍부하지 않습니다.
02    그들은 불을 피우기 위해 나무를 모으고 있었다.
03    이 책에는 새로운 것이 하나도 없어.
04    이 자리는 임산부를 위한 자리입니다.
05    우리 담임선생님은 굉장히 엄한 분이세요.


Weight-norm trick on baseline vs. baseline DCTTS - LJSpeech dataset

No. baseline weightnorm trick Script
01 The most trifling acts were magnified into offenses.
02 One smug alderman, a member of the House of Commons, sneered at the ultra philanthropy of the champions of prison improvement.
03 It was understood also that he had served in the army as a private, and had, moreover, undergone a sentence of transportation.


Weight-norm trick on baseline vs. baseline DCTTS - kss dataset

No. baseline weightnorm trick Script
01 어제 주유소를 들러서, 기름을 가득 채웠습니다.
02 눈에 뭔가가 들어갔어요.
03 서점에서 아이디어는 좀 얻었어?


Visualization of EMCD alignment