Syntactically Coded Corpus of Spoken Lithuanian: Developmental Issues and Pilot Studies


  • Laura Kamandulytė Merfeldienė
  • Ingrida Balčiūnienė



The paper deals with the main methodological issues of development of the Corpus of Spoken Lithuanian with particular attention to its syntactic coding and applications for automatized language analysis. First, we consider a methodology of development of the Corpus as well as the principles of transcribing and coding Lithuanian speech data. The main concepts, such as “utterance” “sentence”, etc. are discussed. Second, we present results of a pilot study in interrogatives that are typical for natural spontaneous spoken Lithuanian. Results of the automatized analysis of interrogatives revealed that a frequency and distribution of the Wh- and yes/ no questions is rather similar. Among the Wh- questions, the questions non-containing the interrogative particle seem to be dominant, while the questions containing the interrogative particle at the beginning ot at the end were much rarer. Among the different functional subtypes of Wh- questions, adverbial ones seem to be the most freequent; among the adverbial Wh- questions, the spatial ones were the most frequent. Certainly, the present study is rather pilot due to the novelty of automatized syntactic approach to the data of spoken Lithuanian, thus much more complex studies still await for future investigations. A use of interrogative sentences will be studied from the perspective of different genres (e.g., monologue vs dialogue), social characteristic of the speakers, and a situation of conversation (e.g., public vs private speech). Generally, we believe that future systematic corpus-based research of spontaneous spoken language will give more possibilities to identify, evaluate, and elaborate the development of the Lithuanian language.

Key words: corpus linguistics, syntax, syntactic coding, interrogatives, Lithuanian.