JADT 2016 - Sciencesconf.org

JADT2016

International Conference on Statistical Analysis of Textual Data

7-10 Jun 2016 Nice (France)

sciencesconf.org:jadt2016:83716

Unsupervised Learning of Morphology in the USSR

Franck Burlot 1, *, @ , François Yvon 1, @

1 : Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur [Orsay] (LIMSI) - Website

Université Paris XI - Paris Sud, CNRS : UPR3251

Université Paris Sud (Paris XI) Bât. 508 BP 133 91403 ORSAY CEDEX - France

* : Corresponding author

This article deals with an important task for the processing of morphologically rich languages. Unsupervised learning of morphology mainly consists of learning a grammar that enables word segmentation into morphemes without any prior knowledge of the analysed language. It is usually assumed that the origins of such a task date back to the times of Zellig Harris, an assumption which ignores the important contribution of his contemporary, the Soviet linguist Nikolaj Dmitrievič Andreev, who developed a statistico-combinatorial model to learn morphology in the 1960s. We propose a critical description of Andreev's model and attempt to bring to light its pioneering aspects as well as its weaknesses. Finally, we show results over several European languages. Our implementation of the model can be downloaded from https://github.com/franckbrl/stat_comb_model.

Subject :	:	oral
Language of text	:	English
Topics	:	NLP
Keywords	:	Morphology ; Soviet Linguistics ; Information Theory ; word segmentation ; unsupervised learning ; Nikolaj Andreev ; statistico combinatorial model

Online user: 1