Analysis of Student Graduation Prediction Using Machine Learning Techniques on an Imbalanced Dataset: An Approach to Address Class Imbalance
DOI:
https://doi.org/10.15294/sji.v11i3.5528Keywords:
Classification , Machine learning, SMOTE, Timely graduation, UniversityAbstract
Purpose: Machine learning is a key area of artificial intelligence, applicable in various fields, including the prediction of timely graduation. One method within machine learning is supervised learning. However, the results are influenced by the distribution of data, particularly in the case of imbalanced classes, where the minority class is significantly smaller than the majority class, affecting classification performance. Timely graduation from a university is crucial for its sustainability and accreditation. This research aims to identify a suitable method to address the issue of predicting timely graduation by managing class imbalance using SMOTE (Synthetic Minority Oversampling Technique).
Methods: This study uses a five-year dataset with 26 attributes and 1328 records, including status labels. The preprocessing stages involve applying five classification algorithms: Decision Tree (DT), Naive Bayes (NB), Logistic Regression (LR), K-Nearest Neighbors (KNN), and Random Forest (RF). Each algorithm is used both with and without SMOTE to handle the class imbalance. The dataset indicates that 60.84% of the cases represent timely graduations. To mitigate the imbalance, over/under-sampling methods are employed to balance the data. The evaluation metric used is the confusion matrix, which assesses the classification performance.
Result: Without SMOTE, the accuracies were 89.12% for DT, 79.65% for NB, 89.47% for LR, 87.72% for KNN, and 90.88% for RF. With SMOTE, the accuracies were 88.89% for DT, 81.48% for NB, 91.05% for LR, 92.59% for KNN, and 89.81% for RF. The algorithms NB, LR, and KNN showed improvement with SMOTE, with KNN yielding the best results.
Novelty: Based on the comparison results, a comparison of five algorithms with and without SMOTE can reasonably classify several of the algorithms being compared.