Optimization of Lithuanian Diphone Databases
Keywords:balso sintezė iš teksto, difonas, difonų bazės optimizavimas, garsų panašumas, difonų vartosenos dažniai
Creation and optimization of the Lithuanian diphone inventory used for concatenative text-to-speech synthesis is studied in this paper. Creation of diphone database starts with compilation of a list of diphones. This is not a trivial problem, because some diphones are not valid. If valid diphones are deduced using language phonotactics rules, some diphones which needed to synthesize foreign words are omitted. Besides, a lot of practically unused diphones are added to the list. Statistical diphone usage analysis was performed in this work. The results of this analysis imply that using statistically motivated diphone inventory pruning one can compile a much smaller inventory while keeping a very high text coverage.Diphone inventory pruning using phone similarity (one diphone is substituted with another that sounds similarly) and using phone stretching (a missing diphone is synthesized by stretching phones of adjacent diphones) is described, as well. Listening experiments with diphones that contain a vowel or diphthong followed by stop consonant (or fricative, for phone stretching) were executed. Groups of diphones were identified for which synthesized speech quality is not degraded (and for which quality is degraded only marginally) when using two described methods. In addition, potential diphone inventory reduction was estimated.
Autorių teisės yra apibrėžtos Lietuvos Respublikos autorių teisių ir gretutinių teisių įstatymo 4-37 straipsniuose.