A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending

Dwika Ananda Agustina Pertiwi; Kamilah Ahmad; Jumanto Unjung; Much Aziz Muslim

doi:10.15294/sji.v11i4.14018

Authors

Dwika Ananda Agustina Pertiwi Universiti Tun Hussein Onn Malaysia Author
Kamilah Ahmad Universiti Tun Hussein Onn Malaysia Author
Jumanto Unjung Universitas Negeri Semarang Author
Much Aziz Muslim Universitas Negeri Semarang Author

DOI:

https://doi.org/10.15294/sji.v11i4.14018

Keywords:

P2P lending, Data balancing model, LightGBM, XGBoost

Abstract

Purpose: The problem of imbalanced datasets often affects the performance of classification models for prediction, one of which is credit risk prediction in P2P lending. To overcome this problem, several data balancing models have been applied in the existing literature. However, existing research only evaluates performance based on classification model performance. Thus, in addition to measuring the performance of classification models, this study involves the contribution of the performance of data balancing models including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE).

Methods: This research uses the Lending Club dataset with an imbalanced ratio (IR) of 4.098, and 2 classifiers such as LightGBM and XGBoost, as well as 10 cross-validation to assess the performance of the data balancing model including Random Oversampling (ROS), Random Undersampling (RUS), and Synthetic Minority Oversampling (SMOTE). Then the model is evaluated using the metrics of accuracy, recall, precision, and F1-score.

Result: The research results show that SMOTE has superior performance as a data balancing model in P2P lending, with an accuracy of the LightGBM+SMOTE model of 92.56% and the XGBoost+SMOTE model of 92.32%, where this performance is better than other models.

Novelty: This research concludes that SMOTE as a data balancing model to improve credit risk prediction in P2P lending has superior performance. Apart from that, in this case, we find that the larger the data size used as a model training sample, the superior performance obtained by the classification model in predicting credit risk in P2P lending.

A Performance Comparison of Data Balancing Model to Improve Credit Risk Prediction in P2P Lending

Authors

DOI:

Keywords:

Abstract

Downloads

Article ID

Published

Issue

Section

How to Cite

Main-Sidebar

Stat Counter