Towards a Refined Inventory of Lexical Bundles: an Experiment in the Formulex Method

Łukasz Grabowski, Rita Juknevičienė


A number of corpus studies focusing on the description of the use and functions of lexical bundles havebeen conducted recently in order to explore the phraseology of learner language. As with any studiesof lexical bundles, the problem of overlapping or structurally incomplete items poses a particularchallenge. In practice, it is often difficult to align such units with specific discourse functions. The factthat lexical bundles do not constitute neat form-and-meaning mappings results from, among otherreasons, their being grounded in language use rather than language system. In this pilot study weattempt to test a new method called Formulex (Forsyth, 2015a; 2015b) to verify whether an applicationof the criterion of coverage – in addition to the conventional criteria of orthographic length, minimumfrequency and distribution range (Biber et al., 1999) – may help obtain a more refined inventory of lexicalbundles and hence facilitate further qualitative analyses. To that end, we use Polish and Lithuaniancomponents of the International Corpus of Learner English (ICLE, Granger et al., 2009), as well as theLOCNESS corpus (CECL), representing academic essays written by British and American students. Theresults revealed that many lexical bundles of fixed length identified in a conventional way are fragmentsof longer chunks of text and hence they should not be treated as complete or standalone 4-word lexicalitems. It was also revealed that the application of the Formulex method, where the word sequences aremutually exclusive, helps a researcher filter out overlapping or non-perceptually salient lexical bundlesand, ultimately, specify more precise boundaries of lexical bundles of fixed length.



lexical bundles; learner language; n-grams

