Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase

Sigita Rackevičienė; Liudmila  Mockienė; Andrius Utka; Aivaras Rokas

doi:10.5755/j01.sal.1.39.29156

Authors

Sigita Rackevičienė Mykolas Romeris University, Lithuania https://orcid.org/0000-0001-5794-0296
Liudmila Mockienė Mykolas Romeris University, Lithuania https://orcid.org/0000-0001-7153-7276
Andrius Utka Vytautas Magnus University, Lithuania https://orcid.org/0000-0001-5212-4310
Aivaras Rokas Vytautas Magnus University, Lithuania

DOI:

https://doi.org/10.5755/j01.sal.1.39.29156

Keywords:

termbase compilation, parallel and comparable corpora, terminology annotation, terminology extraction, knowledge-rich context extraction, deep-learning systems, LLOD technologies

Abstract

The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.

Author Biographies

Sigita Rackevičienė, Mykolas Romeris University, Lithuania

Prof. dr., Institute of Humanities, Faculty of Human and Social Studies, Mykolas Romeris University, Lithuania
Liudmila Mockienė, Mykolas Romeris University, Lithuania

Prof. dr., Institute of Humanities, Faculty of Human and Social Studies, Mykolas Romeris University, Lithuania
Andrius Utka, Vytautas Magnus University, Lithuania

Assoc. Prof. dr., Centre of Computational Linguistics, Vytautas Magnus University, Lithuania
Aivaras Rokas, Vytautas Magnus University, Lithuania

Programmer, Centre of Computational Linguistics, Vytautas Magnus University, Lithuania

Methodological Framework for the Development of an English-Lithuanian Cybersecurity Termbase

Authors

DOI:

Keywords:

Abstract

Author Biographies

Downloads

Published

Issue

Section

License

Information

logo2

crossref2