Multilingwalny (polsko-niemiecki) korpus języka mówionego MCCA dla celów analizy kulturologicznej i suprasegmentalnej (nie)grzeczności językowej

Bonacchi, Silvia; Mela, Mariusz

Multilingwalny (polsko-niemiecki) korpus języka mówionego MCCA dla celów analizy kulturologicznej i suprasegmentalnej (nie)grzeczności językowej

Abstract

In the article, we will present our experiences with – and problems that we came across while – working on a multilingual corpus of speech data (Polish and German) and conducting its pragmalinguistic and suprasegmental analysis. Furthermore, we will present some reflections on the notions of parallelity and comparability in this context. Creating corpora of spoken language constitutes a great challenge for the researcher due to the elusive nature of speech. Spoken data can be accessed by the researcher either in the form of transcripts of audio/video recordings (according to the methods of multimodal analysis) or in the form of notes from speech interactions (according to the ethnographic method). The researcher who wants to collect data for his/her specific purposes − for example if he/she wants to investigate (im)politeness − has to create a setting, a context of interaction and a situation in which a given phenomenon can be elicited. The need for a phonetic analysis makes it necessary to make audio or video recordings of data. These need to be made in a recording studio in order to ensure quality suitable for such an analysis (e.g. one channel per speaker, no background noises). Participants in recording sessions do not behave as naturally as they would in a natural setting (i.e. without microphones or cameras). What is more, spoken language is characterised by phenomena that are exclusively typical for it when compared to written language. They include: anacoluthons, corrections, repairs, hearer signals, speaker signals, particles, discourse markers etc., i.e. phenomena that are treated as communicative ‘disturbances’ in written language but are fundamental in face-to-face-interactions. Considering the above requirements, one can state that creating corpora of spoken language requires a completely different approach than corpora of written language. In the following article, a bilingual (Polish and German) corpus of spoken language is presented. The corpus has been created as part of the MCCA: Multimodal Communication: Culturological Analysis project for the purposes of culturological and suprasegmental analysis and consists of three types of recordings. They are: dyadic conversations, scripted monologues (where the participants were supposed to intonate sentences in order to achieve a certain result), and extracts from TV talk shows. The recordings have further been transcribed using the Folker programme and GAT2 (GesprächsAnalytisches Transkriptionssystem) conventions, annotated (by means of the ELAN programme) and phonetically analysed (using Praat programme).

Description

Gruszczyńska, Ewa; Leńko-Szymańska, Agnieszka, red. (2016). Polskojęzyczne korpusy równoległe. Polish-language Parallel Corpora. Warszawa: Instytut Lingwistyki Stosowanej, pp. 182-195.