Total views : 1017

A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

Affiliations

  • Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam
  • Department of Information Technology, VNU-University of Engineering and Technology, Hanoi, Viet Nam
  • Department, Faculty of Telecommunications, Posts and Telecommunications Institute of Technology, Hanoi, Viet Nam

Abstract


Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.

Keywords

Credit Scoring, Feature Selection, Machine Learning, and Parallel Random Forest.

Full Text:

 |  (PDF views: 1462)

References


  • Altman EI and Saunders A. Credit risk measurement: Developments over the last 20 years. J. Bank. Financ. 1997; 21, 1721–42.
  • Wu, X. et al. Top 10 algorithms in data mining. 2008. Doi: 10.1007/s10115-007-0114-2.
  • Angelini E, di Tollo G, & Roli, A. A neural network approach for credit risk evaluation. Q. Rev. Econ. Financ. 48, 733–755 (2008).
  • Bellotti, T. and Crook J. Support vector machines for credit scoring and discovery of significant features. Expert Syst. Appl. 2009; 36, 3302–08.
  • Wen F and Yang X. Skewness of return distribution and coefficient of risk premium. J. Syst. Sci. Complex. 2009; 22:360–71.
  • Zhou X, Jiang W, Shi Y and Tian Y. Credit risk evaluation with kernel-based affine subspace nearest points learning method. Expert Syst. Appl. 2011; 38:4272–79.
  • Kim G, Wu CH, Lim S and Kim J. Modified matrix splitting method for the support vector machine and its application to the credit classification of companies in Korea. Expert Syst. Appl. 2012; 39:8824–34.
  • Liu H and Motoda H. Feature Selection for Knowledge Discovery and Data Mining. 1998.
  • Guyon I and Elisseeff A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003; 3:1157–82.
  • Oreski S, Oreski D and Oreski G. Hybrid system with genetic algorithm and artificial neural networks and its application to retail credit risk assessment. Expert Syst. Appl. 2012; 39:12605–617.
  • Saberi M. et al. A granular computing-based approach to credit scoring modeling. Neurocomputing. 2013; 122:100– 15.
  • Lee S and Choi WS. A multi-industry bankruptcy prediction model using back-propagation neural network and multivariate discriminant analysis. Expert Syst. Appl. 2013; 40:2941–46.
  • Ghatge AR and Halkarnikar PP. Ensemble Neural Network Strategy for Predicting Credit Default Evaluation. 2013; 2:223–25.
  • Chaudhuri A and De K. Fuzzy Support Vector Machine for bankruptcy prediction. Appl. Soft Comput. J. 2011; 11:2472–86.
  • Ghodselahi A. A Hybrid Support Vector Machine Ensemble Model for Credit Scoring. Int. J. Comput. Appl. 2011; 17:1–5.
  • Huang C-L, Chen M-C and Wang C-J. Credit scoring with a data mining approach based on support vector machines. Expert Syst. Appl. 2007; 33:847–56.
  • Li ST, Shiue W and Huang MH. The evaluation of consumer loans using support vector machines. Expert Syst. Appl. 2006; 30:772–82.
  • Martens D, Baesens B, Van Gestel T and Vanthienen J. Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 2007; 183:1466–76.
  • Wang Y, Wang S and Lai KK. A new fuzzy support vector machine to evaluate credit risk. IEEE Trans. Fuzzy Syst. 2005; 13:820–31.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.