Total views : 134

Comparing the Efficacy of Decision Tree and its Variants using Medical Data

Affiliations

  • Department of Information Technology, Thiagarajar College of Engineering, Madurai – 625015, Tamil Nadu, India
  • Department of Computer Science and Engineering, G.K.M. College of Engineering and Technology, Chennai – 600063, Tamil Nadu, India
  • Department of Medicine, Theni Government Medical College and Hospital, Theni – 625531, Tamil Nadu, India

Abstract


The objective of this research work focus towards the identification of best variant between decision tree algorithm such as Weighted Decision Trees (WDT), C4.5 Decision Trees and C5.0. Methods: Decision tree has a number of variants such as ID3, Weight based decision tree, C4.5 and C5.0 algorithms. This research work focus towards the predictive performance analysis of weight based decision tree with information gain as the splitting criterion. The algorithm proceeds iteratively with the assignment of weights over the training instances to determine the best among the data attributes. Thereby, the attribute with best weight values can be significantly determined by an improvement over its accuracy. Results: The experimental results proves that among the variants of decision trees the algorithm corresponding to C4.5 provides the highest accuracy of about 71.42% and R2 value of about 0.265 respectively and for real world data the accuracy is about 48.69%. The effectiveness of the decision tree algorithm can be still improved by adopting certain feature selection techniques with the combination of decision tree algorithm. Conclusion: The determined results show that Decision tree algorithm suits well for medical data problems. The efficiency of the algorithm can still be improved by applying Decision Trees for various real world data problems such as Diabetes, Cancer classification with feature selection paradigms. But still a larger set of real world data has to be investigated.

Keywords

C4.5 Decision Tree Algorithm, Data Classification, Heart Disease, Predictive Analysis, Weighted Decision Tree,

Full Text:

 |  (PDF views: 139)

References


  • Seele P. Predictive sustainability control: A review assessing the potential to transfer big data driven ‘predictive policing’ to corporate sustainability management. Journal of Cleaner Production. 2016. doi:10.1016/j.jclepro.2016.10.175
  • Thomas H, McCoy, Snapper L, Theodore BS, Stern A, Roy H, Perlis. Underreporting of delirium in statewide claims data: Implications for clinical care and predictive modeling.
  • Psychosomatics. 2016; 57(5):480-8 doi:10.1016/j.psym.2016.06.001
  • Liang X, Qu F, Yang Y. An improved ID3 decision tree algorithm based on attribute weighted. Material and Environmental Sciences. 2015; 8:234-46.doi:10.2991/cmes-15.2015.167
  • David Son, Giri T. Data preparation using data quality matrices for classification mining. European Journal of Operational Research. 2010; 197(2):764-72. doi:10.1016/j.ejor.2008.07.019
  • Alhammady H, Ramamohanarao K. Using emerging patterns to construct weighted decision trees. IEEE Transactions on Knowledge and Data Engineering. 2006; 18(7):865-76.doi:10.1109/TKDE.2006.116
  • Wenguang J, Lijing H. Improved C4.5 decision tree. IEEE Transactions on Evolutionary Computation. 2014; 18(6):4-7.doi:10.1109/ITAPP.2010.5566133
  • Sheik Abdullah A. A data mining model for predicting the coronary heart disease using random forest classifier.
  • International Journal of Computer Applications. 2012; 3:973-93-80867-33-2.
  • Sheik Abdullah A. A data mining model to predict and analyze the events related to coronary heart disease using decision trees with particle swarm optimization for feature selection. International Journal of Computer Applications.2012; 55(8):973-93-80870-77-4. doi:10.5120/8779-2736
  • Parthiban P, Selvakumar S. Big data architecture for capturing, storing, analyzing and visualizing of web server logs. Indian Journal of Science and Technology. 2016 Jan; 9(4):1–9. doi:10.17485/ijst/2016/v9i4/84173
  • Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms.
  • Journal on Neural Computation. 2013; 10(4):432-38.doi:10.1162/089976698300017197
  • Bhagwatkar P, Parmalik Kumar K. Improved the classification ratio of C4.5 algorithm using attribute correlation and genetic algorithm. International Journal of Advanced Computer Engineering and Communication Technology.
  • ; 3(2):2319-526.
  • Alhammad H, Ramamohanarao K. Using emerging patterns and decision trees in rare-class classification.
  • IEEE International Conference on Data Mining; 2004. doi:10.1109/ICDM.2004.10058
  • Selvakumar S, Sheik Abdullah A, Suganya R. Decision support system for type II diabetes and its risk factor prediction using Bee based harmony search and Decision tree Algorithm. International Journal of Biomedical Engineering and Technology. INDERSCIENCE. (In Press).
  • Zhao H, Li X. A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism. Elsevier. 2017; 378(1):303–16. doi:10.1016/j.ins.2016.09.054
  • Luo H, Chen Y, Zhang W. The application ofemerging patterns for improving the quality of rare-class classification. 2nd International Workshop on Database Technology and Applications. 2010. doi:10.1007/978-3-540-24775-3_27
  • Chen F, Li X, Lixiong. Improved C4.5 decision tree algorithm based on sample selection. 4th IEEE International Conference on Software Engineering and Service Science; 2013. doi:10.1109/ICSESS.2013.6615421
  • Sitanggang IS, Yaakob R, Mustapha N, Nuruddin AAB.An extended ID3 decision tree algorithm for spatial data.
  • IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services; 2011. doi:10.1109/ ICSDM.2011.5969003
  • Sheik Abdullah A, Selvakumar S, Karthikeyan P, Mahesh M, Deepchand PK. An efficient prediction model using multi swarm optimization empowered by data classification for Type II diabetes. 3rd International Conference on Business Analytics and Intelligence (ICBAI); Bangalore. 2015.
  • Liu Y, Hu L, Yan F, Zhang B. Information gain with weight based decision tree for the employment forecasting of undergraduates. IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing; 2013. doi:10.1109/GreenCom-iThings-CPSCom.2013.417
  • Han J, Kamber M. Data Mining Concepts and Techniques. 3rd Ed. 2012.
  • Karaolis M, Moutiris JA, Pattichs L. Assessment of the risk factors of coronary heart events based on data mining with decision trees. IEEE Transactions on IT in Biomedicine. 2010; 14(3). doi:10.1109 TITB.2009.2038906

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.