×

You are using an outdated browser Internet Explorer. It does not support some functions of the site.

Recommend that you install one of the following browsers: Firefox, Opera or Chrome.

Contacts:

+7 961 270-60-01
ivdon3@bk.ru

  • Object-oriented model of the morphological analyzer of russian-language text

    The article focuses on the developement of a stemmer for Pymystem morphological analyser. Theoretical justification for morphological analysis selection as a high-priority task of linguistic text analysis is given. The state-of-art analysers are described, their strengths and shortcomings are highlighted. The authors propose a core algorithm for nested structures splitting into structured class hierarchy. A method for selected parts of speech key features retrieval using regular expression and Python is defined. The main steps of the hierarchy creation algorithm are examined and documented. The researchers analyse core results of the study and describe their findings alongside with propositions for further developement of the presented software.

    Keywords: stemmer, morphological analyzer, class tree, regular expression, text analysis, computational linguistics, lemma, token, word-formation, class hierarchy