A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending
DOI:
https://doi.org/10.15294/sji.v11i4.14018Keywords:
P2P lending, Data balancing model, LightGBM, XGBoostAbstract
Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE).
Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score.
Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models.
Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.