Text Classification for AI Generated Content with Machine Learning and Deep Learning Models
Loading...
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
5th International Conference on Smart Information Systems and Technologies (SIST)
Abstract
The rapid development of generative AI models, such as GPT-4, LLaMA, and Gemini, is causing an explosion of AI-generated text that may be akin to human writing. This poses a challenge in differentiating between AI generated content and human-authored text across a range of verticals: academic integrity, misinformation detection, and content moderation. This paper presents a comparison of machine learning and deep learning models on the classifier for AI-generated text. We compare the performance of Logistic Regression with TF-IDF features, a Bi-LSTM model, and a fine-tuned DistilBERT model on data from the COLING Workshop on MGT Detection Task 1, involving text samples from five AI models and human authors. Our experiments showed that Bi-LSTM outperforms other models, yielding the best results in accuracy (90.09%) and F1-score (90.02%). We further present the binary classification performance that distinguishes AI-generated text from human-written content, with an accuracy of 95.9%. It is suggested that deep learning methods are competent in detecting AI-generated text, though there are certain limitations, including adversarial attacks and changing styles of AI-generated writing. Future work will be focused on enhancing model robustness through adversarial training and hybrid architectures.
Description
Keywords
AI-generated text, text classification, deep learning, Bi-LSTM, DistilBERT, machine learning
Citation
Batyr Sharimbayev, Shirali Kadyrov / Text Classification for AI Generated Content with Machine Learning and Deep Learning Models / IEEE 5th International Conference on Smart Information Systems and Technologies (SIST) / 2025