Text Classification for AI Generated Content with Machine Learning and Deep Learning Models

Batyr SharimbayevShirali Kadyrov2025-11-132025-11-132025Batyr Sharimbayev, Shirali Kadyrov / Text Classification for AI Generated Content with Machine Learning and Deep Learning Models / IEEE 5th International Conference on Smart Information Systems and Technologies (SIST) / 2025https://repository.sdu.edu.kz/handle/123456789/2186The rapid development of generative AI models, such as GPT-4, LLaMA, and Gemini, is causing an explosion of AI-generated text that may be akin to human writing. This poses a challenge in differentiating between AI generated content and human-authored text across a range of verticals: academic integrity, misinformation detection, and content moderation. This paper presents a comparison of machine learning and deep learning models on the classifier for AI-generated text. We compare the performance of Logistic Regression with TF-IDF features, a Bi-LSTM model, and a fine-tuned DistilBERT model on data from the COLING Workshop on MGT Detection Task 1, involving text samples from five AI models and human authors. Our experiments showed that Bi-LSTM outperforms other models, yielding the best results in accuracy (90.09%) and F1-score (90.02%). We further present the binary classification performance that distinguishes AI-generated text from human-written content, with an accuracy of 95.9%. It is suggested that deep learning methods are competent in detecting AI-generated text, though there are certain limitations, including adversarial attacks and changing styles of AI-generated writing. Future work will be focused on enhancing model robustness through adversarial training and hybrid architectures.enAttribution-NonCommercial-ShareAlike 4.0 InternationalAI-generated texttext classificationdeep learningBi-LSTMDistilBERTmachine learningText Classification for AI Generated Content with Machine Learning and Deep Learning ModelsArticle