Towards a Refined Inventory of Lexical Bundles: an Experiment in the Formulex Method
Keywords:lexical bundles, learner language, n-grams
A number of corpus studies focusing on the description of the use and functions of lexical bundles havebeen conducted recently in order to explore the phraseology of learner language. As with any studiesof lexical bundles, the problem of overlapping or structurally incomplete items poses a particularchallenge. In practice, it is often difficult to align such units with specific discourse functions. The factthat lexical bundles do not constitute neat form-and-meaning mappings results from, among otherreasons, their being grounded in language use rather than language system. In this pilot study weattempt to test a new method called Formulex (Forsyth, 2015a; 2015b) to verify whether an applicationof the criterion of coverage – in addition to the conventional criteria of orthographic length, minimumfrequency and distribution range (Biber et al., 1999) – may help obtain a more refined inventory of lexicalbundles and hence facilitate further qualitative analyses. To that end, we use Polish and Lithuaniancomponents of the International Corpus of Learner English (ICLE, Granger et al., 2009), as well as theLOCNESS corpus (CECL), representing academic essays written by British and American students. Theresults revealed that many lexical bundles of fixed length identified in a conventional way are fragmentsof longer chunks of text and hence they should not be treated as complete or standalone 4-word lexicalitems. It was also revealed that the application of the Formulex method, where the word sequences aremutually exclusive, helps a researcher filter out overlapping or non-perceptually salient lexical bundlesand, ultimately, specify more precise boundaries of lexical bundles of fixed length.
The copyright for the articles in this journal is retained by the author(s) with the first publication right granted to the journal. The journal is licensed under the Creative Commons Attribution License 4.0 (CC BY 4.0).