Correcting text in morphologically rich languages
Loading...
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Currently, there is a tendency to increase the volume of documents containing complex text structures. Professional text proofreaders provide their services to correct errors in the text for not a small amount of money. A very large number of people every day around the world, who are closely connected with science or education, write a large volume of articles and, accordingly, each and all of them should be written without errors and in the correct version. In order to automate this process, it was necessary to develop an algorithm based on the methods of analysis and correction of the input text. For high-quality text synthesis, it is necessary to use machine learning technologies that require deep knowledge and understanding in this area. Among the many machine learning algorithms Hunspell algorithm seems to be one of the best to solve this issue. The essence of this algorithm is to bring all the words contained in the text to the original format. Thus, this work is based on multilevel segmentation of errors of Kazakh language text from the Internet or by manual user input. It is worth noting that due to technological progress, the main source of linguistic research is social networks, which is critical problem due to the dubious fidelity of texts. The this work was the formation and development of a spell-checking main purpose of ting Hunspell algorithm for algorithm for the Kazakh language based on the exis English. As a result, the Hunspell algorithm was studied, where the methods of extracting the base of the word, as well as t form were analyzed. The essence of this al detecting errors and typos ‘1 a word. These rules are contained in the dictionary and its dependence on the algorithm is directly proportional.
Description
Keywords
documents, text proofreaders, errors, education, articles