The Root Structure of Lithuanian Inflective Words

The aim of this research is to identify the structural patterns of root morphemes of Lithuanian inflective words and to establish their productivity and frequency. First, with reference to the earlier work conducted by Lithuanian linguists, we discuss the structural diversity of root morphemes and determine the productivity of structural patterns (the number of different roots of a specific pattern). Then we analyse data from real usage. For this stage, the database of the morphemics of the Lithuanian language (Lietuvių kalbos morfemikos duomenų bazė) was used. 265 thousand usage instances of inflective words constitute the research data.


Introduction
The root structure of verbs provided in dictionaries has been analysed in works of Kruopienė (2000); Kaukienė (1994Kaukienė ( , 2002 has studied the morphonological structure of verb roots; Akelaitienė (1996Akelaitienė ( , 2000 has been interested in vowel change for many years; the structure of nominal words has been analysed by Karosienė (2004). The aforementioned researchers analyse dictionary but not usage data. Therefore, based on such data we can say which patterns of morphemic structure or sound change are possible in the Lithuanian language; we can also indicate the productivity of a particular pattern; however, we cannot say anything about their frequency. In this article, productivity is perceived as realization of a pattern which can be calculated based on dictionaries. Frequency shows usage instances in real connected speech.
The research and results will be presented in the following stages: first, we provide a survey of the results of research conducted by Lithuanian linguists. Then we analyse and discuss the results of real usage (the frequency of structural patterns is identified). The frequency data for the study was collected from the database of the morphemics of the Lithuanian language (Lietuvių kalbos morfemikos duomenų bazė, further in the text DbML), which was created at Vytautas Magnus University and served as the basis for morphemics dictionaries (see Rimkutė, Kazlauskienė, Raškinis 2011). The database contains approximately 72 thousand words from different text styles and various topics. For the extraction of the empirical data, a special computer programme was created 1 . The programme works in several stages: 1) a list (dictionary) of different word forms used in the research material was generated and usage number (frequency) of each word form was identified; 2) words were automatically stressed (in Lithuanian the stress is free) and transcribed with tools available to us (for the latter, see Kazlauskienė, Raškinis, Vaičiūnas 2010), mistakes were corrected manually; 3) morpheme boundaries in all words were marked by conventional symbols manually; 4) words were replaced by conventional symbols 2 : C -for consonants, V -for vowels, W -for diphthongs ai, au, ei, ui [ɐɪ, ɐʊ, ɛɪ, ʊɪ]; 5) the received word code CVCW... was segmented into morphemes according to the labels used in words to mark morpheme boundaries; 6) morphemes matching the same code were grouped and their usage frequency in the research data was calculated. The research data from DbML encompasses 174,200 nominal words (103,681 nouns, 23,714 adjectives, 41,139 pronouns, and 5,666 numerals) and 90,800 verbs.
Non-inflective words have been left outside the scope of the study because quite a large part thereof are primary (only a root morpheme, e.g., ir 'and', ar 'whether', o 'and/while', ne 'no'); moreover, some of them are grammatical forms of multiword expressions (e.g., iš tiesų 'truly', cf. N Pl Gen). Proper nouns were not analysed either, because their morphemic segmentation is complicated.
International words have not been analysed either, as they usually came to the language as morphemically unsegmented words showing structural regularities of other languages rather than those of Lithuanian. However, the old borrowings, e.g., stiklas 'glass', agurkas 'a cucumber', have been included into our research as they have adapted in the language, adjusted to the system of the Lithuanian language, and most Lithuanians do not identify them as borrowings.

The Structural Patterns of Roots
In inflectional languages like Lithuanian, the boundary between synchrony and diachrony is not always clear. Quite often fusion occurs, i.e., morphemes merge; for this reason, affixes are often considered as part of a root. For example, from the modern language perspective, gyvcan be considered as the root of the adjective gyvas 'alive', although historically, this adjective should be linked to the verb gyti 'to recover'. In this case, the morphemic division is between the root gy-, the suffix -v-and the flexion -as. The other example is the adjective kuklus 'modest' that is historically related to kukti 'to bow'; however, the adjective has digressed from its underlying word considerably in respect of its meaning. During the preparation of the data for the research, we tried to follow the synchronic principle. If the link of a word to a possible underlying word has faded, such a word was treated as an underlying word. For this reason, gyvas was divided into gy-v-as, while kuklus into kukl-us.
Consonants tend to react actively to neighbouring sounds (assimilation, degemination, alternation, elimination of a consonant may occur); as a result, it is not always simple to recognise and to identify exact morpheme boundaries nor to name specific processes of consonant harmonisation (for more see Kazlauskienė and Cvilikaitė 2015). The future tense suffix -s-could be mentioned as an example: bėg-s 'he will run', bėg-s-iu 'I will run'. When a root ends on s, morphemes overlap and the remaining single consonant belongs to both morphemes, e.g., ves+s=ves 'he will marry', vesiu 'I will marry'.
This article employs the following terms: an onset is the initial consonant group of a morpheme, a coda is the final consonant group, and a medial cluster is a consonant group located between vowels in the middle of a morpheme. A pattern is considered to be a CV structure unit (with indicated quantity of consonants and a vowel or a diphthong) and a type is RTV (with an indicated consonant group according to the articulation thereof).

The Pattern Structure of Nominal Word Roots
Lithuanian words and syllables can start and end on a vowel or a consonant (up to three consonants) ( Kazlauskienė & Raškinis, 2008a). A hiatus (a juncture of two vowels) in words of Lithuanian origin and old borrowings is only possible between morphemes; it is not possible between all morphemes, but only between a prefix and a root (e.g., pa-akys 'under eye') and in compounds, when a connecting vowel is added, e.g., ilga-uodegė 'long-tailed'. Based on structural regularities of other language units, the theoretic structure of a monosyllabic root could be as follows 3 : C 0-The root structure of Lithuanian primary nominal words was extensively described by Karosienė (2004) 4 . The results of her research served as the basis to establish the inventory of structural patterns of nominal words. According to the research results of Karosienė, the roots of primary Lithuanian nominal words can be non-syllabic (there are only 10 roots), monosyllabic (78%), disyllabic (21%) and trisyllabic (15 roots). There may be no initial consonant group, while the final group, according to the research by Karosienė, is essential, and the total consonant amount in roots does not exceed 6. Thus the consonant number provided in the formulas above is the maximum; however, it correlates with other root consonant clusters: if the onset has 3 consonants, the maximum number of consonants in the coda of monosyllabic roots can be 3, while in disyllabic roots respectively the sum of consonants of a medial cluster and a coda will not exceed 3. On the grounds of the research results of Karosienė, the theoretic formulas for root structure have to be revised considerably: C 0-3 V(W)C 1-4 (Karosienė, 2004, p. 22), C 0-3 V(W)C 1-4 V(W)C 1-4 , for trisyllabic roots C 0-1 VC 1-2 VC 1-2 VC 1-2 (Karosienė, 2004, p. 73).
Based on the possible number of consonants, there can be 15 theoretic patterns of monosyllabic roots and 30 patterns of disyllabic roots. The research conducted by Karosienė revealed 46 realized patterns. All monosyllabic patterns are realized; while 8 patterns are not among the disyllabic realized ones (they are patterns with 5-6 consonants: VC 1 VC 4 , VC 2 VC 3-4 , VC 3 VC 2-3 , VC 4 VC 2 , C 3 VC 1 VC 2 , C 1 VC 4 VC 1 ); only 5 patterns of trisyllabic roots are realized.
Patterns C 1 V(W)C 1 , C 1 V(W)C 2 and C 2 V(W)C 1 are the most productive, as they make up more than a half of all roots of primary nominal words (55%). Thus, we can expect that these particular patterns will dominate in real connected speech.
What determines productivity of a pattern? First, attention should be paid to the quantity of consonants in a pattern: 3% of primary nominal words have 1 consonant, 36% contain 2 consonants, 37% have 3 consonants, 18% have 4 consonants, 5% have 5 consonants, and 1% has 6 consonants. It is obvious that the prevailing patterns are those whose total amount of consonants is between 2 and 3. Such are three most productive patterns mentioned above. However, the total amount of consonants is not the only factor influencing the productivity of a pattern. For example, the productivity of patterns C 1 V(W)C 2 and V(W)C 3 , both of which have three consonants, differs: respectively 14% and 1% of primary nominal words; the productivity of patterns C 2 V(W)C 2 and C 1 V(W)C 3 with four consonants is 7% and 4%. A more thorough analysis of pattern productivity suggests a conclusion that the productivity of a pattern largely depends on the quantity of consonants in the coda: the larger the number, the rarer the pattern. Patterns with 3-4 consonants in the coda do not tend to have high productivity; they represent respectively 8% and 1% of all primary nominal words. The dominating patterns are those with 1 consonant (they make up 64%) and 2 consonants (26%) in the coda. All these factors suggest that a complicated coda is not a usual phenomenon in the Lithuanian language and it would be the result of a juncture of a historical root and a consonantal suffix.
The analysis of primary roots is a good starting point to identify structural possibilities of morphemes. However, in real usage not only are particular primary words selected from the available inventory, but, also, derivatives are produced from primary words. Usage frequency of the latter may have influence on the frequency of some root patterns.
In DbML, we found 43 structural root patterns of nominal words. The major part of these patterns is within the periphery of usage; there are 3 main patterns: C 1 V(W)C 1 , C 1 V(W)C 2 , C 2 V(W)C 1 . Not only are they the most productive (the highest number of different roots in DML), but, also, their examples are most frequently used in real language (see Table 1 The numeral and pronoun group is dominated by non-syllabic C 1 roots (they are mainly pronouns, because there is only 0.2% of numerals with C 1 structure) and C 1 V(W)C 1 . These two patterns amount to 85% of all examples with numerals and pronouns in DbML.

Structural patterns
Usage instances Symbol * is used to mark the patterns which were not mentioned by Karosienė; they can be identified only during the analysis of connected speech. The beginning of a root may contain from 1 to 3 consonants. There are three types of binary clusters: ST (e.g., storas 'fat'), SR (e.g., slenkstis 'threshold'), TR (e.g., knyga 'a book'), with the latter type being the most common, as even half of consonant clusters of nominal word roots represent this type. The trinomial consonant cluster of a morpheme beginning is rare (only 4% of words in DbML) and it is only the STR pattern (e.g., skruzdė 'an ant').

The Structure of Verb Roots
The structure of roots of Lithuanian verbs provided in DML was described by Kruopienė (2000) 7 . The results of this analysis are used for the description of the inventory of structural patterns of primary verb roots.
According to the results of Kruopienė's research, the roots of Lithuanian primary verbs are only monosyllabic. This is their major difference from nominal words, which can be both non-syllabic and polysyllabic. However, like in roots of nominal words, there may be no onset in verbs, while the coda, based on Kuopienė's analysis, is compulsory 8 , and the number of consonants in a root does not exceed 6, either.
In view of research results by Kruopienė, the structural formula for roots of primary verbs is as follows: C 0-3 V(W)C 1-4 (Kruopienė, 2000, pp. 60-61, 100). There may be 15 theoretic patterns; however, only C 3 V(W)C 3 pattern is not realised . The following patterns are the most productive: C 1 V(W)C 1 , C 2 V(W)C 1 , C 1 V(W)C 2 ; together they make up 77% of all roots of primary verbs. For this reason, we can infer that the roots of such structure will dominate in real usage.
The structural patterns of primary verb roots: The connection between pattern productivity and consonant number is similar to that of primary nominal words: 2% of primary verbs have 1 consonant, 37% have 2 consonants, 41% 7 The main data source of Kruopienė is the third edition of DML (Kruopienė, 2000, p. 5). 8 This was influenced by Kruopienė's choice not to analyse the roots of three lemmas of verbs (infinitive, present and past simple tenses) from the perspective of the modern language, but, rather, only historically non-derivative forms of present and past simple tenses, e.g., the root allomorphs of eiti, eina, ėjo 'to go, he goes, he went' are {ei-}, {ein-}, {ėj-}, the latter have codas, while the infinitive form does not. have 3 consonants, 14% have 4 consonants, 3% have 5 consonants, and only 2% contain 6 consonants. Thus, patterns with 2 to 3 consonants dominate. Like in the group with nominal words, the patterns whose codas contain 1 consonant (64%) or 2 consonants (30%) are the most productive. There are 4% of trinary codas and only 2% of quaternary codas. Thus, a complex coda is certainly not a typical feature of the Lithuanian language.
All verb roots used in DbML were classified into 23 structural patterns. Based on the data from DbML, in coherent Lithuanian texts monosyllabic verb roots make up 99% and have C 0-3 V(W)C 0-4 structure. The most frequent patterns C 1 V(W)C 1 , C 2 V(W)C 1 , C 1 V(W)C 2 together amount to 78% of all verb roots in DbML (see Table 2).
General regularities for dictionary and usage data are partially similar. The dominating pattern is C 1 V(W)C 1 . DML has 36% of such roots, while the usage data provide nearly one and a half times more roots. Pattern C 2 V(W)C 1 in usage is twice as less frequent (11%) than in DML (23%). There are slightly fewer instances of pattern C 1 V(W)C 2 in usage Table 2 The distribution of structural patterns of verb roots in DbML (%) 9
Consonant clusters in the middle of verb roots are rare (primary verbs, as it has already been mentioned, are monosyllabic and do not have medial clusters). Medial clusters in DbML occur only in a few derivatives: pasninkauti 'to fast', prielgetauti 'to beg', apsiskarmalavę 'ragged', apvalinamas 'being rounded'.
Like in nominal words, verb roots match the structural patterns of a syllable: trinomial STR and binomial ST, SR, TR (e.g., springo 'he choked', skalbė 'he washed', šniokštė 'he roared', krenkštė 'he croaked'), with the latter cluster dominating (47% in DbML). 9 Endings of primary verbs are not as varied as in nominal words. Verbs tend to have only four binomial types of endings ST, RS, RT, RR, trinomial types TST, RST and quaternary RTST (e.g., respectively vystė 'he developed, delsė 'he procrastinated', valgė 'he ate', varva 'it drops', šniokštė 'he roared', urzgė 'he growled', gergždė 'he wheezed'). In real usage endings of verb 9 Symbol * is used to mark the patterns which were not mentioned by Kruopienė; they can be identified only during the analysis of connected speech.