Federico Aurora (Norwegian Institute of Philology), Jens Braarvig (University of Oslo)
The Bibliotheca Polyglotta (BP, https://www.philology.no/bp) of which the Thesaurus Literaturae Buddhicae (TLB) is a part, is an online, open access, parallel corpus conceived as a tool to study the "Grand Multilinguals", i.e the important texts throughout history that had a great cross-cultural and cross-national impact on politics, religion, science and cultures in general (e.g. the Bible, or Aristotles’ and Plato’s works). BP is connected with a project named "Multiligualism, Linguae Francae and the Global History of Religious and Scientific Concepts" whose main goal is to study and describe the flow of concepts between areas that have different languages, but which by diffusion of concepts come to share various systems of knowledge, in the belief that this can provide essential insights into history and how our conceptual frames are historically created (Braarvig & Geller 2018).
The scope of the texts in BP is wide, but its emphasis is on texts with great diffusion and many translations globally before the Renaissance. Thus, in addition to parallel Buddhist texts in Sanskrit, Pali, Chinese, Tibetan and other Buddhist languages, it contains, among others, parallel versions of Ancient Greek texts in Latin, Syriac and Arabic, Latin texts with their historical translations into German, French and other European languages, and Hittite-Akkadian or Hurrian-Hittite parallel texts. However, also later texts have been occasionally integrated, like the Universal Declaration of Human Rights, probably the most important "Grand Multilingual" of the present day, and a rich library of parallel texts of Henrik Ibsen, the only Norwegian multilingual text to ever have some real global impact, and a great multilingual reception.
The dialectics between a Lingua Franca as donor language of concepts and knowledge systems, and the multilingual situation created when such knowledge systems are translated into various receiving languages which wishes to accommodate the concepts from the Lingua Franca, is a crucial process. The study of such process will, according to our view, give essential insights into history and into how our conceptual frames are historically created (Braarvig 2018). The BP will, thus, with its growing material, serve as a tool for the global history of ideas. The main goal of BP is thus different from (and complementary to) the main goals of corpora built for syntactic and pragmatic analysis, like the parallel biblical texts in a number of old Indo-European languages contained in the PROIEL Treebank (Haug & Løndahl 2008; https://proiel.github.io/), now searchable through Syntacticus (http://syntacticus.org/).
When a search for a particular concept or word is performed, references to that particular concept from many writing-cultures are accessed. This is made possible by the fact that most of the texts in the parallel corpus also have a translation into English – the present main Lingua Franca. The word(s) resulting from the searches are shown in the sentence in which they appear in any language searched, and the user can go from each of these results to the multilingual record where the sentence, or "smallest semantic unit above word", as we call it, is represented in its multilingual set up. In this way the BP is a kind of sentence lexicon, which builds on the principle that a word only gets its meaning in a context.
The next development of the BP, of which some preparatory programming work has been done, will be to generate word by word indices from the multilingual sentence by sentence records.
Braarvig, J. 2018. ‘Dependent Languages’. In: Braarvig, J. & M. J. Geller (eds.), Studies in Multilingualism, Lingua Franca and Lingua Sacra, 79-90. https://www.mprl-series.mpg.de/studies/10/index.html
Haug, D. T. T. & Marius L. Jøhndal. 2008. 'Creating a Parallel Treebank of the Old Indo-European Bible Translations'. In: Caroline Sporleder and Kiril Ribarov (eds.). Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008), 27-34.