Study of the transformation of Kazakh language speech into text data
Loading...
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Engineering and Natural Science
Abstract
The transformation of speech into text data is a key component in the development of modern language technologies and artificial intelligence. Despite significant advances in this field, support for languages with unique grammatical and phonetic characteristics, such as Kazakh, remains a challenge. The purpose of this study is to analyze the existing method of converting speech in the Kazakh language into text and evaluate their effectiveness. The research methodology includes the analysis of the VOSK model for speech transformation in the Kazakh language. An experimental study is being conducted based on the KazakhTTS dataset using machine learning and natural language processing methods. The results of the experiment, presented as an indicator of the error rate in the word (WER), showed that VOSK big and VOSK small have almost the same indicators (51% and 53% respectively). It was also noted that there are limitations in recognizing word endings and that some errors occur during speech recognition. The discussion of the results highlights the potential of the model and points to the need for further improvement and training in working with more diverse data. In conclusion, the key conclusions are outlined, as well as potential directions for further research in the field of Kazakh speech recognition.
Description
Keywords
Citation
Kursabayeva A / Study of the transformation of Kazakh language speech into text data / 2024 / Computer Science - 7M06012