Journées internationales d'Analyse statistique des Données Textuelles
7-10 juin 2016 Nice (France)
Meaning structure of cognate words in English and Russian: comparing word sense frequency
Boris Iomdin  1, 2, *@  , Anastasiya Lopukhina  1, *@  , Konstantin Lopukhin  3, *@  , Grigory Nosyrev  4, *@  
1 : Russian Language Institute of the Russian Academy of Sciences  (IRL RAS)  -  Site web
Volkhonka 18/2, Moscow 119019 -  Russie
2 : School of linguistics, National Research University Higher School of Economics  (HSE)  -  Site web
Staraya Basmannaja 21/4, Moscow 105066 -  Russie
3 : Scrapinghub  -  Site web
4 : Yandex  -  Site web
* : Auteur correspondant

Polysemy is a key issue in theoretical semantics and lexicography as well as in computational linguistics. When words have several senses, it is important to describe them properly in the dictionary (a lexicographic task) and to be able to distinguish between them in given context (a computational linguistics task, known as WSD). Recently attention has been drawn to the fact that different senses normally have different frequencies in corpora. Elsewhere we reported on our research into that issue and introduced several techniques for determining sense frequency based on dictionary entries matched with data from large corpora. Information about word sense frequency may enrich language learning resources and help lexicographers order senses within a word according to frequency, if needed. When learning a foreign language, a student may encounter a word that exists in his/her native language (as a borrowing or an international word), and is tempted to assume that the foreign word and its equivalent have the same meaning structure. However, sometimes this is not the case, and the most frequent sense of a word in one language may be much less frequent for its cognate. We propose a method for detecting such cases. For that purpose, we selected a set of Russian words included into the Active Dictionary of Russian, which have more than two dictionary senses and have cognates in English. We estimated frequencies for English and Russian senses using SemCor and Russian National Corpus respectively, matched senses in each pair of words and compared their frequencies. In this way, we revealed cases in which the most frequent senses and the whole meaning structures are, cross-linguistically, substantially different and studied them in more detail. As a result, we obtained information that may prove useful for learners of Russian or English as well as for lexicographers and computational linguists dealing with machine translation.

Personnes connectées : 1