STEMMING OF KAZAKH LANGUAGE

Thumbnail Image

Date

2021

Journal Title

Journal ISSN

Volume Title

Publisher

Абай атындағы ҚазҰПУ-нің ХАБАРШЫСЫ, «Физика-математика ғылымдары» сериясы, №1(73)

Abstract

Nowadays natural language processing is widely used. For instance, it can be used to translate text, in search engines systems, text topic identification. Such applications require preprocessing of text. It should be done, because preprocessing of text can influence on system accuracy. Text preprocessing can be done by several ways. One approach is identifying root of word. Advantage of identifying root of word is that it can save memory of computer, because repeated roots will be saved one time. This paper describes stemming systems, which can identify root of word. In literature review part authors reviewed to stemming algorithms, which can identify roots of words of Russian, Uzbek, Turkish languages. Then authors proposed stemming system, which can identify root of word of Kazakh language. In current paper authors describe how their system works. To test the system words from various parts of speech were entered. Proposed system can identify roots of noun, verb, adjective, numeral words. The system response can be seen in table 1. Pictures below show what kinds of suffixes, endings can be concatenated with root of word of Kazakh language. However not all combinations are shown in pictures. In conclusion part advices for how to develop stemming system are written.

Description

Keywords

stemming, morphology, parts of speech

Citation

Bogdanchikov A , Baimuratov O.A , Ayazbayev D.A / STEMMING OF KAZAKH LANGUAGE / Абай атындағы ҚазҰПУ-нің ХАБАРШЫСЫ, «Физика-математика ғылымдары» сериясы, №1(73) / 2021