Total views : 261

Performance Evaluation of Classification Algorithms on Different Data Sets

Affiliations

  • Ansal University Gurgaon, Gurgaon - 122003, Haryana, India

Abstract


Objectives: The most appropriate classifier selections for the particular data sets were generally found harder. Therefore, in this study various existing classifiers have been considered on several data sets to assess their performance. Methods/ Statistical Analysis: Usually, the selections of classification techniques, such as, Naive Bayes (NB), Decision Tree (DT), Lazy Classifiers (LC), Support Vector Machine, etc., depend on the type and nature of the attributes in the data set. The wrong selection of classification technique can certainly lead to wrong results and poor performance. This concept is the motivation behind this study. Usually the data set consists of nominal attributes, numeric attributes or mix attributes (both numeric and nominal attribute). In this paper, different types of data sets are applied on three most popular classification techniques, such as, NB, DT, and LC, to evaluate their performances. Findings: The result reveals that NB classifier performs well on both mix attribute data and numeric data but decision tree classifier performs better on nominal attribute data. Lazy classifier’s performance is just average for all kind of data. Application/Improvements: The results of this study will helps in understanding the performance of different classification techniques on different data sets. Further, results can be utilized to select the best classification technique among NB, decision tree and lazy classifiers in order to use with different data sets.

Keywords

Accuracy, Classification, Data set, Decision tree, Lazy Classifiers, NB.

Full Text:

 |  (PDF views: 257)

References


  • Tan PN, Steinbach M, Kumar V. Introduction to data mining. Pearson Addison-Wesley; 2006.
  • Han J, Kamber M. Data mining concepts and techniques. USA Morgan Kaufmann Publishers; 2001. p. 1–169.
  • Jukic N, Jukic B. Modeling-centered data warehousing learning Methods, Concepts and Resources. International Journal of Business Intelligence Research. 2012; 3(4):74–95.
  • Ana A, Manuel FS. Closing the gap between data mining and business users of business intelligence systems a design science approach. International Journal of Business Intelligence Research. 2012; 3(4):14–53.
  • Nauck D, Kruse R. Obtaining interpretable fuzzy classification rules from medical data. Artificial Intelligence in Medicine. 1999; 16(2):149–69.
  • Luukka P. Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets. Computers in Biology and Medicine. 2007; 37(8):1133–40.
  • Kumar S, Toshniwal D. A data mining framework to analyse road accident data. Journal of Big Data. Springer. 2015; 2(26):1–18.
  • Kumar S, Toshniwal D. A data mining approach to characterize road accident locations. Journal of Modern Transportation. Springer. 2016; 24(1):62–72.
  • Kumar S, Toshniwal D. A novel framework to analyze road accident time series data. Journal of Big Data. Springer. 2016; 3(8):1–11.
  • Kumar S, Toshniwal D. Analysis of hourly road accident counts using hierarchical clustering and cophenetic correlation coefficient. Journal of Big Data. Springer. 2016; 3(13):1–11.
  • Kumar S, Toshniwal D. Analysing road accident data using association rule mining. Proceedings in IEEE International Conference on Computing, Communication and Security held in Mauritius. India: IEEE Xplore; 2015.
  • Kumar S, Toshniwal D, Parida M. A comparative analysis of heterogeneity in road accident data using data mining techniques. Evolving Systems. Springer. 2016; 5. DOI: 10.1007/s12530-016-9165-5.
  • Maimon O, Rokach L. The data mining and knowledge discovery handbook. 2nd ed: Berlin: Springer; 2010.
  • Dunham MH. Data mining introductory and advanced topics. New Jersey: Prentice Hall; 2002.
  • Analytics vidhya. Available from: http://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/
  • NBian. Available from: http://www.saedsayad.com/naive_bayesian.htm
  • Elkan C. Niave Baysian learning. Adapted from Technical Report No. CS97-557, Department of Computer Science and Engineering, University of California, San Diego; 1997. p. 1–11.
  • Shahrukh T, Prashasti K. A survey on decision tree based approaches in data mining. International Journal of Advanced Research in Computer Science and Software Engineering. 2015; 5(4):25–71.
  • Vijayarani S, Muthulakshmi M. Comparative analysis of bayes and lazy classification algorithms. International Journal of Advanced Research in Computer and Communication Engineering. 2013; 2(8):3118–24.
  • Durairaj M, Deepika R. Comparative analysis of classification algorithms for the prediction of leukemia cancer. International Journal of Advanced Research in Computer Science and Software Engineering. 2015; 5(8):787–91.
  • k-fold cross validation. Available from: http://www.csie.ntu.edu.tw/~b92109/course/Machine Learning/Cross-Validation.pdf
  • Rupali B, Sonia V. Implementation of ID3 algorithm. International Journal of Advanced Research in Computer Science and Software Engineering. 2013; 3(6):845–51.
  • Measuring search effectiveness. Available from: https://www.creighton.edu/fileadmin/user/HSL/ docs / ref/Searching_-_Recall_Precision.pdf
  • Baeza-Yates B, Ricardo R, Ribeiro-Neto RN, Berthier B. Modern information retrieval. New York, NY: ACM Press, Addison-Wesley; 1999. p. 1–103. ISBN: 0-201-39829-X.
  • Zolfagharifar SA, Karamizadeh FV. Developing a hybrid intelligent classifier by using evolutionary learning (Genetic Algorithm and Decision Tree). Indian Journal of Science and Technology. 2016 May; 9(20):1–8.
  • Kim M, Kim CJ. Factors associated with decision to participate in physical activity by people with spinal cord injury: An analysis using decision tree. Indian Journal of Science and Technology. 2016 Jul; 9(26):1–7.
  • Azad C, Jha VK. Data mining based hybrid intrusion detection system. Indian Journal of Science and Technology. 2014 Jan; 7(6):1–9.
  • Rajalakshmi V, Mala GSA. Anonymization by data relocation using sub-clustering for privacy preserving data mining. Indian Journal of Science and Technology. 2014 Jan; 7(7):1–6.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.