Text Classification for AI Generated Content with Machine Learning and Deep Learning Models
| dc.contributor.author | Batyr Sharimbayev | |
| dc.contributor.author | Shirali Kadyrov | |
| dc.date.accessioned | 2025-11-13T06:20:02Z | |
| dc.date.available | 2025-11-13T06:20:02Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | The rapid development of generative AI models, such as GPT-4, LLaMA, and Gemini, is causing an explosion of AI-generated text that may be akin to human writing. This poses a challenge in differentiating between AI generated content and human-authored text across a range of verticals: academic integrity, misinformation detection, and content moderation. This paper presents a comparison of machine learning and deep learning models on the classifier for AI-generated text. We compare the performance of Logistic Regression with TF-IDF features, a Bi-LSTM model, and a fine-tuned DistilBERT model on data from the COLING Workshop on MGT Detection Task 1, involving text samples from five AI models and human authors. Our experiments showed that Bi-LSTM outperforms other models, yielding the best results in accuracy (90.09%) and F1-score (90.02%). We further present the binary classification performance that distinguishes AI-generated text from human-written content, with an accuracy of 95.9%. It is suggested that deep learning methods are competent in detecting AI-generated text, though there are certain limitations, including adversarial attacks and changing styles of AI-generated writing. Future work will be focused on enhancing model robustness through adversarial training and hybrid architectures. | |
| dc.identifier.citation | Batyr Sharimbayev, Shirali Kadyrov / Text Classification for AI Generated Content with Machine Learning and Deep Learning Models / IEEE 5th International Conference on Smart Information Systems and Technologies (SIST) / 2025 | |
| dc.identifier.uri | https://repository.sdu.edu.kz/handle/123456789/2186 | |
| dc.language.iso | en | |
| dc.publisher | 5th International Conference on Smart Information Systems and Technologies (SIST) | |
| dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International | en |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
| dc.subject | AI-generated text | |
| dc.subject | text classification | |
| dc.subject | deep learning | |
| dc.subject | Bi-LSTM | |
| dc.subject | DistilBERT | |
| dc.subject | machine learning | |
| dc.title | Text Classification for AI Generated Content with Machine Learning and Deep Learning Models | |
| dc.type | Article |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- TextClassificationforAIGeneratedContentwithMachineLearningandDeepLearningModels_.pdf
- Size:
- 474.77 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 12.6 KB
- Format:
- Item-specific license agreed to upon submission
- Description: