Liubov Nesterenko (National Research University Higher School of Economics, Moscow)

Quantitative analysis of passives with agent phrase based on multilingual parallel data

In the last few years, works presenting quantitative analysis on parallel data tend to appear more often, which seems to be a big step forward in linguistic research. Unfortunately, we still can not say that studies based on parallel data became mainstream among typologists (Levshina 2016), (Nesterenko 2019).

Voice constructions are a phenomenon dependent on semantic and pragmatic factors and parallel data seems to be very suitable for their comparative analysis, particularly of their functional properties. Since parallel texts represent situations with the same pragmatics in different languages, one can compare the form and function of a set of related constructions. The possibility of semantic annotation transfer is a huge benefit of parallel corpora, which is especially important for the research of voice and other alternations. Grammatical constructions can have very fine functional differences that, at first glance, may seem to be barely distinguishable. Analyzing such constructions in samples out of the context and without quantitative assessment, it can be simply impossible to observe the whole range of potential usages and compare them cross-linguistically. Parallel data with complex annotation help to solve this problem and get more sophisticated results.

I will focus on the use of parallel data for the investigation of passives with agent phrases. Passive constructions have been a field of interest among linguists for a long time, and there is plenty of studies on this topic, e.g. (Keenan, Dryer 1981), (Shibatani 1988), (Givón 1994), (Tsunoda, Kageyama 2006), (Kulikov 2011), (Zúñiga & Kittilä 2019). The example below illustrates the difference between active and passive voice:

(1) a. The paparazzi saw Zelda at the party.
b. Zelda was seen by the paparazzi at the party.
(Zúñiga & Kittilä 2019)

Usually one distinguishes between different types of passive constructions, and it is often indicated that the agent phrase is optional, which seems to be misleading because passives with agent phrase (PAP) can not be interchangeably used with agentless passives. Kiparsky questions the statement regarding the optionality of the agent phrase in his study and he also points out that the distribution of agent phrases is governed by lexical and semantical mechanisms (Kiparsky 2013), see also (Siewierska, Bakker 2012).

In this paper, I will show that the use of PAP has a semantic motivation and it is not restricted to patient promotion/agent demotion and stativization of the verb, functions claimed in (Zúñiga & Kittilä 2019). Also, using logistic regression models, I will show that there are languages, in which PAP is used mostly in semantically motivated situations and those that prefer to use PAP as a discourse-oriented construction. The study is based on a corpus of Harry Potter books series from 1 to 7 in English, German, Swedish, French, Italian, Spanish, Russian, Czech, and Bulgarian.

A brief exploration of translation units shows that there are English passives translated as passives in all or almost all languages, but there are also those that have more non-passive translations. Based on the similarity of the translation units we can cluster them and get the groups of units that share common features. I suppose that units within each group share particular characteristics that can be used for further analysis of passive and related constructions. There is a related study of lexical semantical based on clustering and measuring translation similarities, see (Wälchli, Cysouw 2012).

The cases described above show how different computational methods can be applied to parallel corpus data and on what aspects of comparison we can focus using either of them.


Givón T. (1994). Voice and inversion (Vol. 28). John Benjamins Publishing Company.

Kiparsky, P. (2013). Towards a null theory of the passive. Lingua, 125(1), 7-33.

Keenan, E. L., Dryer, M. S. (1981). Passive in the world's languages. Linguistic Agency, University of Trier.

Kulikov L. (2011), Voice typology. In The oxford handbook of linguistic typology. Oxford University Press, pp. 368-398

Levshina, N. (2016). Why we need a token-based typology: A case study of analytic and lexical causatives in fifteen European languages. Folia Linguistica, 50(2), 507-542.

Nesterenko, L. (2019). Multilingual Parallel Corpora: Alternative Source of Language Data for Typological Studies, Applying Perspectives and Problems. Voprosy Jazykoznanija, 1 Mar., 111–125.

Shibatani M. (1988), Passive and voice (Vol. 16). John Benjamins Publishing.

Siewierska, A., & Bakker, D. (2012). Three takes on grammatical relations. Argument structure and grammatical relations: a crosslinguistic typology,126, 295.

Tsunoda T., Kageyama T. (2006). Voice and grammatical relations: in honor of Masayoshi Shibatani (Vol. 65). John Benjamins Publishing.

Zúñiga, F., & Kittilä, S. (2019). Grammatical voice. Cambridge University Press.