Marina Akimova (Lomonosov MSU)

Anastasia Belousova (Lomonosov MSU; Universidad Nacional de Colombia)

Igor Pilshchikov (UCLA; TLU; Lomonosov MSU)

Vera Polilova (Lomonosov MSU)

CPCL: A Multilingual Parallel Corpus of Poetic Texts and New Perspectives for Comparative Literary Studies

The open access information system “CPCL: Comparative Poetics and Comparative Literature” ( comprises four interconnected subsystems: a Corpus of parallel texts (contains poetic translations, their originals, proto-sources of the originals, and intermediary translations; features clusters of interrelated texts, and parallel view), a digital Library (contains commented editions of poetic translations and their originals, as well as books and articles on comparative poetics), an “Encyclopedia” (contains biographical and bibliographical information about poets, translators and researchers), and a “Thesaurus” (structured glossary: contains terms found in secondary literature). The interface, metadata, and descriptions are in three languages: English, Russian and Spanish.

In the “Library” subsystem, we created a collection of primary literature (editions of translated poetic works and their originals; over 8,600 titles so far) and a collection of secondary literature (scholarly books and articles; over 80 editions, ca 300 titles; commentaries in commented editions are also treated as secondary literature). Over 2000 poems were exported from the Library to the Corpus, and semanticized links were established between them. We have also identified and described (in the corpus markup) the metrical and stanzaic attributes of these texts.

The Corpus contains poetic works translated into Russian from French, Italian, and Spanish, their originals and intermediary translations (in German, English and other languages). We also add older (Ancient Greek, Latin etc.) texts that were sources for the Romance (French etc.) originals of Russian translations. Therefore, there are four text types represented in the Corpus: T = Translation, O = Original, I = Intermediary, S = Source. T and O are always poetic, whereas I and S can be either poetic or prosaic. The Corpus features clusters of interrelated texts and a parallel view thereof. Visualized links are generated automatically on the basis of the relevant metadata.

Links are “semanticized” because they are formalized and normalized, i.e. explicitly described as, for example, “a link to a scholarly study of this poem” (and a particular page, on which this text is discussed) or “a link to a commentary on this poem”, or “a link to an edition of this poem.” Links of the this type are “Corpus-Library” links. The reverse links are “Library-Corpus” links. Their semantics can be described as “links to the primary texts discussed in this study” or “a link to the primary text this commentary provides information about.” Links of the third type are “Corpus-Corpus” links. They are explicitly described as “a link to the original of this poem” or “a link to a translation of this poem,” etc. Visualized links are generated automatically on the basis of the relevant metadata. They are presented as a list and as a graph.

All texts in the Library and the Corpus are supplied with general metadata (title, dates, author, translator). In addition, all texts in the Corpus are supplied with special poetic corpus metadata (meter, verse length, clausula, rhyme, stanza or fixed form). The metrical and stanzaic structure of the work is described in the Corpus subsystem using distinctive features of poetic meters—an approach similar to that implemented in the Poetic Subcorpus of the National Corpus of the Russian Language ( and the Corpus of Czech verse (

We do not only accumulate existing information, but also add the results of our research. We have already identified over 150 previously unknown Romance sources of Russian translated poems. We have also identified and described (in the corpus markup) the poetic attributes of over 2000 titles in Russian, Italian, Spanish, and French. The project started in 2017 and was open for public use in December 2019.

This paper is part of a research project based at Lomonosov Moscow State University and supported by Russian Science Foundation Grants 17-18-01701 (in 2017–19) and 19-78-10132 (from 2020).


