Wyszukiwanie ekwiwalentów w Polsko-Słowackim Korpusie Równoległym

Petrincová, Marianna

Wyszukiwanie ekwiwalentów w Polsko-Słowackim Korpusie Równoległym

Abstract

This paper deals with a comparison of two closely related languages, Polish and Slovak, and focuses on prefixed verbs, in the case of which this proximity is especially visible, and which may cause problems for translators and lexicographers. In the paper the prefixed verbs are treated as a lexicographic problem and a possible solution is presented that involves searching for equivalents in real translations. A small parallel Polish-Slovak corpus was created for the purpose of this research. In the paper the process of compiling the corpus is described, starting with the acquisition of parallel texts, through texts processing and choosing corpus tools, and ending with the annotation and lemmatization of texts. Next, an analysis of the equivalents of prefixed verbs found in the corpus was carried out to measure their lexicographic potential, i.e. their suitability to be included in the dictionary, based on their accuracy and frequency in different contexts and with different arguments. Four different levels of lexicographic potential are distinguished: high, average, low and zero lexicographic potential. The paper presents the preliminary results of corpus analysis of the lexicographic potential of the Slovak equivalents of Polish prefixed verbs which focuses on suitable lexicographic material. Since this paper presents a small part of more extensive research on verbs with different prefixes, only an analysis of equivalents of verbs with prefix u- and roz- (ubierać, ubrać, ukrywać, ubolewać, rozciągać się, rozlec się) is presented. The last part of the paper discusses problems found along the way, and it considers the adjustments to the evaluation process. The problems include, among others: the size of the corpus, the number of occurrences, the evaluation method as such (where the goal is to adjust it so as it is the most objective method possible but without compromising the intuitions of native or proficient speakers of the two languages), and, finally, the inconsistency of the information provided by the Word Sketch tool in the Sketch Engine, which produces slightly different results for the Polish and Slovak reference corpora. Overall, the presented analysis demonstrates how parallel corpora can be used to improve the quality of bilingual dictionaries by providing translation equivalents from the real translations.

Description

Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 144-155.