Book text recognition in Kazakh Language
Loading...
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Engineering and Natural Science
Abstract
The digitization of Kazakh textual content poses unique challenges, particularly due to the languageās typographical diversity and the scarcity of digital resources. This thesis presents a novel approach to Optical Character Recognition (OCR) tailored to Kazakh book texts, leveraging a synthetic dataset to overcome the limitations of data scarcity and enhance model accuracy. Through meticulous dataset engineering, employing tools like SynthTiger, the study generates images that closely replicate the conditions of Kazakh printed material. The OCR models are rigorously trained and tested, demonstrating high precision in recognizing diverse text presentations. Additionally, this work includes the development of a web application utilizing the EasyOCR framework, which underscores the practical application of the research. Hosted on Hugging Face Spaces, the application offers users the capability to extract text from various image and document formats, illustrating the robustness and adaptability of the OCR models to real-world scenarios.
Description
Keywords
Citation
Bimurat M / Book text recognition in Kazakh Language / 2024 / Computer Science - 7M06102