Optimization of Lithuanian Diphone Databases
DOI:
https://doi.org/10.5755/j01.sal.0.19.947Keywords:
balso sintezė iš teksto, difonas, difonų bazės optimizavimas, garsų panašumas, difonų vartosenos dažniaiAbstract
Creation and optimization of the Lithuanian diphone inventory used for concatenative text-to-speech synthesis is studied in this paper. Creation of diphone database starts with compilation of a list of diphones. This is not a trivial problem, because some diphones are not valid. If valid diphones are deduced using language phonotactics rules, some diphones which needed to synthesize foreign words are omitted. Besides, a lot of practically unused diphones are added to the list. Statistical diphone usage analysis was performed in this work. The results of this analysis imply that using statistically motivated diphone inventory pruning one can compile a much smaller inventory while keeping a very high text coverage.
Diphone inventory pruning using phone similarity (one diphone is substituted with another that sounds similarly) and using phone stretching (a missing diphone is synthesized by stretching phones of adjacent diphones) is described, as well. Listening experiments with diphones that contain a vowel or diphthong followed by stop consonant (or fricative, for phone stretching) were executed. Groups of diphones were identified for which synthesized speech quality is not degraded (and for which quality is degraded only marginally) when using two described methods. In addition, potential diphone inventory reduction was estimated.Downloads
Published
Issue
Section
License
The copyright for the articles in this journal is retained by the author(s) with the first publication right granted to the journal. The journal is licensed under the Creative Commons Attribution License 4.0 (CC BY 4.0).