Maria Kunilovskaya (Research Group in Computational Linguistics, University of Wolverhampton, UK)

Types of translationese and register variation in English-to-Russian professional translation

This research relies on several parallel and comparable corpora to explore the linguistic properties of texts translated from English into Russian by professional translators across a variety of registers. It continues the line of investigations into the nature of translated texts powerfully put forward by Mona Baker (1993). In the translationese studies it is maintained that translations are statistically different from the originally authored texts in the target language, and their distinctive features are usually revealed through the comparison of translational corpora to genre- and domain- comparable non translations in the same language. The parallel component in this type of research is used to establish whether the languages in the translation pair are dissimilar with respect to a given feature and which linguistic phenomena are likely to trigger translationese given the comparative analysis of the languages, represented by the corpora.

On the corpus resources side of the project, we explore the available English-Russian parallel corpora and formats and make use of self-compiled ones to build a register-balanced corpus for translationese studies.

To capture translationese, we develop a set of human-interpretable lexicogrammatic features shared by the two languages and known to be translationese indicators. Most of them are inspired by the previous research and intuitions in translation and translationese studies (Chesterman, 2010; Baroni & Bernardini, 2006; Rabinovich & Wintner, 2016; Volansky et al., 2015⁠), contrastive analysis and register studies. It includes the features that are empirically confirmed as relevant for the translationese detection task (for the purposes of this research, by univariate significance testing). Translationese indicators are extracted from Universal Dependencies (UD) annotation (Straka & Straková, 2017)⁠, following a set of patterns similar to those suggested by Douglas Biber⁠ (1988) and adopted, for example, to the analysis of German translations by Stella Neumann (2013) for German and by Katinskaya and Sharoff (2015)⁠ for Russian. Our translationese indicators include type-to-token ratio, sentence length, frequencies of morphological forms (ex. non-finite forms of verb), syntactic relations (ex. clausal complements), syntactic functions (ex. modal predicates), word classes (ex. pronouns, discourse markers). It is known that the results of this type of research are contingent on the quality of the corpus annotation and feature extraction (Evert & Neumann, 2017)⁠. While it is unrealistically labour-intensive to verify the quality of out feature extraction by a human, in selecting the UD model for each language, we relied on the pre-trained models that returned most accurate results for our features and have the highest mean accuracy for UPOS (93.6/97.9), UFeats (94.4/93.7), Lemma (96.1/96.6) and Unlabelled attachment score (UAS) (80.8/87.9) reported at the respective UD page1. At the time of writing, it is 2.2 for English Web Treebank (English-EWT) and 2.3 for Russian-SynTagRus treebank, with the values indicated by figures in the brackets above for English and Russian respectively .

Typical translationese features for English-Russian translation include the overuse of relative clauses, copula verbs, modal predicates, analytical passives, overuse of generic nouns and all types of pronouns as shown in the examples below. Probably, none of the sentences below can be considered ungrammatical, but there is a Master Yoda -style foreign sound to them (all examples are real-life student translations from Russian Learner Translator corpus (Kutuzov & Kunilovskaya, 2014)2.

(1) Necklaces, at first as pectorals that covered the whole chest, evolved from the prehistoric pendants.
Ожерелье – первое нагрудное украшение, которое занимало место на всей груди, которое стало основой для подвесок
[Necklace – first chest decoration, which covered the whole chest, which became the basis for pendants].

(2) …there are many self-employed people who manage to get money from others by means of falsely pretending to provide them with some benefit or service…
Более того, есть много людей, работающих на себя, которые получают деньги обманным путем
[Moreover, there are many people, working for themselves, who get the money in a deceitful way].

(3) ...differences in self-efficacy may simply mean that some teachers struggle to identify solutions to problems beyond their circle of control.
...разница в самооценке может означать лишь то, что некоторые учителя испытывают сложности в нахождении решений задач за пределами того, чем они могут управлять
[...difference in self-evaluation can mean only that some teachers run into difficulties in finding tolitions to tasks beyond the scope of what is under their contol].

Further on, in this talk we introduce a method to calculate the amount of translationese manifested in the data and present its comparisons with the alter native strategies as well as the considerations about the role of the non-translated reference corpus in this line of research. By placing translations into the same feature space as their sources and the genre-comparable non-translated reference texts in the target language, we observe two separate translationese effects: a shift of translations into the gap between the source and the target languages and a shift away from either language. These trends are linked to the features that contribute to each of the effects. Methodologically, we rely on a selection of supervised and unsupervised machine learning techniques, such as Principle Component Analysis and Support Vector Machine classification, to explore our data and validate the results. In the end, we describe the observed variation of the professional norm across several registers at hand and compare the results for humans to machine translations.


Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In Text and Technology: In honour of John Sinclair (pp. 232–250). J. Benjamins.

Baroni, M., & Bernardini, S. (2006). A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3), 259–274.

Biber, D. (1988). Variation across speech and writing (2nd ed.). Cambridge University Press.

Chesterman, A. (2010). Why Study Translation Universals? In R. Hartama-Heinonen & P. Kukkonen (Eds.), Kiasm. Acta Translatologica Helsingiensia (Vol. 1, pp. 38–48). Helsingfors: Helsingfors universitet, Nordica.

Evert, S., & Neumann, S. (2017). The impact of translation direction on characteristics of translated texts : A multivariate analysis for English and German. Empirical Translation Studies: New Methodological and Theoretical Traditions, 300, 47.

Katinskaya, A., & Sharoff, S. (2015). Applying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres. The 5th Workshop on Balto-Slavic Natural Language Processing, September, 65–74.

Kutuzov, A., & Kunilovskaya, M. (2014). Russian Learner Translator Corpus: Design, Research Potential and Applications. Text, Speech and Dialogue: 17th International Conference, TSD 2014, Brno, Czech Republic, September 8-12, 2014, Proceedings, 8655, 315–323.

Neumann, S. (2013). LSB2013 conference - Genre- and Register-related Text and Discourse Features in Multilingual Corpora 11-12 January 2013 - Institut libre Marie Haps, Brussels (Belgium) - January, 11–13.

Rabinovich, E., & Wintner, S. (2016). Unsupervised Identification of Translationese.

Straka, M., & Straková, J. (2017). Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 88–99.

Volansky, V., Ordan, N., & Wintner, S. (2015). On the features of translationese. Digital Scholarship in the Humanities, 30(1), 98–118.