Question Answering system on Regulatory Documents
| dc.contributor.author | Barlybay K. | |
| dc.date.accessioned | 2024-12-12T08:38:02Z | |
| dc.date.available | 2024-12-12T08:38:02Z | |
| dc.date.issued | 2023 | |
| dc.description.abstract | The domain of legal text processing in the Kazakh language is currently underserved, presenting a unique challenge due to its specialized language and the relative scarcity of computational resources dedicated to it. This thesis explicitly identifies the problem: the need for an efficient model to process, understand, and generate meaningful insights from Kazakh legal texts. Addressing this problem, the thesis proposes a solution by developing and evaluating bespoke language models pre-trained on a vast corpus of Kazakh legal documents. The study begins with the assembly of a corpus, which comprises over 315 million words from Kazakh legal texts, alongside a benchmark dataset of 2500 multiple-choice questions for civil service examinations in Kazakhstan. Three language models based on the BERT architecture are then pre-trained. Among these, one model is pre-trained entirely from scratch. To emulate a real-world application in the legal domain, the performance of these models is assessed using the multiple-choice question-answering task. The BERT base model pre-trained from scratch, leveraging both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks, achieves an accuracy of 56.11%. This result underlines the potential of custom pre training strategies on domain-specific corpora for enhancing the performance of language models in specialized areas. In conclusion, this research represents a significant advancement in using AI for legal text processing in the Kazakh language. It presents a promising solution to the problem, paving the way for more efficient and informed decision-making processes in legal and civil service settings. | |
| dc.identifier.uri | https://repository.sdu.edu.kz/handle/123456789/1575 | |
| dc.language.iso | en | |
| dc.subject | using AI, text processing, Kazakh language, civil service settings | |
| dc.title | Question Answering system on Regulatory Documents | |
| dc.type | Other |