Towards The Criteria For Identification Of Idioms And Collocations


  • Erika Rimkutė
  • Agnė Bielinskienė
  • Jolanta Kovalevskaitė
  • Laura Vilkaitė



pastovieji junginiai, kolokacijos, frazeologizmai, frazeologija, tekstynas, anotavimas, lietuvių kalba


The paper analyses two types of multi-word expressions: collocations and idioms. It providesdefinitions of these two types of lexical items and reviews their understanding in worksof Lithuanian and foreign scholars. A problem of identifying multi-word expressions arosewhen carrying out a research project Automatic Identification of Lithuanian Multi-word Expressions(PASTOVU). When the analysis of the corpus compiled from news portaltexts started, it turned out that there is a need for more precise criteria to identify collocationsand idioms in Lithuanian data. Therefore, this paper aims at describing clear criteriathat would allow linguists to identify these particular multi-word expressions in texts asobjectively as possible. It seems that collocations and idioms have three main features: conventionality,semantic non-transparency, and fixed form. All these features and their strengthcan be evaluated applying various tests designed based on identification criteria. The paperexplains the proposed tests and gives examples for each one of them in order to practicallyreveal the similarities and differences between collocations and idioms. From the data described,it becomes clear that the classes of collocations and idioms are not homogeneous:both of them have typical and less typical cases. We suggest conventionality and semanticnon-transparency as two main features, which could help to separate collocations from idiomsby applying particular criteria and tests.