Question Answering system on Regulatory Documents

dc.contributor.authorBarlybay K.
dc.date.accessioned2024-12-12T08:38:02Z
dc.date.available2024-12-12T08:38:02Z
dc.date.issued2023
dc.description.abstractThe domain of legal text processing in the Kazakh language is currently underserved, presenting a unique challenge due to its specialized language and the relative scarcity of computational resources dedicated to it. This thesis explicitly identifies the problem: the need for an efficient model to process, understand, and generate meaningful insights from Kazakh legal texts. Addressing this problem, the thesis proposes a solution by developing and evaluating bespoke language models pre-trained on a vast corpus of Kazakh legal documents. The study begins with the assembly of a corpus, which comprises over 315 million words from Kazakh legal texts, alongside a benchmark dataset of 2500 multiple-choice questions for civil service examinations in Kazakhstan. Three language models based on the BERT architecture are then pre-trained. Among these, one model is pre-trained entirely from scratch. To emulate a real-world application in the legal domain, the performance of these models is assessed using the multiple-choice question-answering task. The BERT base model pre-trained from scratch, leveraging both Masked Language Modeling (MLM) and Next Sentence Prediction (NSP) tasks, achieves an accuracy of 56.11%. This result underlines the potential of custom pre training strategies on domain-specific corpora for enhancing the performance of language models in specialized areas. In conclusion, this research represents a significant advancement in using AI for legal text processing in the Kazakh language. It presents a promising solution to the problem, paving the way for more efficient and informed decision-making processes in legal and civil service settings.
dc.identifier.urihttps://repository.sdu.edu.kz/handle/123456789/1575
dc.language.isoen
dc.subjectusing AI, text processing, Kazakh language, civil service settings
dc.titleQuestion Answering system on Regulatory Documents
dc.typeOther

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kaisar Barlybay.pdf
Size:
5.26 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
12.6 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections