Total views : 133

A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier

Affiliations

  • Department of Computer Science, Bharathiar University, Coimbatore – 641046, Tamilnadu, India
  • School of Computing Science & Engineering, VIT University, Vellore – 632014, Tamilnadu, India

Abstract


Objectives: The time complexity of the machine learning algorithm is directly proportionate to the dimension of the dataset. In this paper, he impacts of dimensionality of the dataset on the machine learning algorithm, Naïve-Bayes Classifier is evaluated with all feature subsets to analyze whether there is any variations in the performance. Methods/Statistical Analysis: Naïve Bayes Classifier is taken for the study to evaluate its variations in terms of its performance in correctly classified instances and incorrectly classified instances. Pima Indian Type II diabetes dataset is taken for the experimental study. Confusion matrix will be formulated for the performance of Naïve-Bayes Classifier using 10-fold cross validation for each run. The study exhibits the impact of the dimensionality on the performance of Naïve-Bayes Classifier. Findings: The Naïve Bayes classifier classifies the patient records either as diabetes or as non-diabetes using the values of the feature set. It is a probabilistic approach of classifying the patient records into the binary class. It is found that there is an impact on the performance of Naïve Bayes Classifier due to the dimensionality of the feature set it terms of Classification accuracy, number of true positives, true negatives, false positives and false negatives. The incorrect classification is certainly dangerous. Whereas the valid classification facilitates the healthcare systems in terms of planning effective course of treatment which will save the life of the patient. The invalid classification will lead to a wrong diagnosis while formulating the treatment plan and it will lead to loss of life. Hence, the invalid classification in terms of false negative rate is to be viewed very seriously. In this paper, the study shows that there is an impact on the performance of Naïve Bayes Classifier due to the higher dimensionality of the dataset. Application/Improvements: They will be used in medical Informatics for the quality diagnosis and effective treatment planning. The focus on the false positive rate in the classification accuracy of Naïve Bayes Classifier will notably help the healthcare systems to diagnose the patients accurately to save life.

Keywords

Classification Accuracy, Dimensionality Reduction, Machine Learning, Naïve-Bayes Classifier.

Full Text:

 |  (PDF views: 106)

References


  • Hirota K, Pedrycz W. Fuzzy computing for data mining. Proceedings of the IEEE. 1999; 87(9):1575–600. Crossref
  • Sarojini B. An Integrated Approach of Feature Selection and Parameter optimisation of kernel to enhance the performance of Support Vector Machine. International Journal of Communication Networks and Distributed Systems. 2015; 15(2-3):265–78. Crossref
  • Freeman C, Kuli D, Basir O. An evaluation of classifierspecific filter measure performance for feature selection. Pattern Recognition. 2015; 48(5):1812–26. Crossref
  • Balakrishnan S, Narayanasamy R, Savarimuthu N. Enhancing the performance of LibSVM Classifier by Kernel F-score Feature Selection. 2009; 40:533–43.
  • Kumar A, Tyagi AK, Tyagi SK. Data Mining Various Issues and Challenges for Future - A Short discussion on Data Mining issues for future work. International Journal of Emerging Technology and Advanced Engineering. 2014; 4(1):1–8.
  • Balakrishnan S, Narayanasamy R, Savarimuthu N, Samikkannu R. SVM Ranking with Backward Search for Feature Selection in Type II Diabetes Databases. Proc. IEEE International Conference on Systems Man and Cybernetics SMC, 2008. p. 2628–33. Crossref
  • Balakrishnan S, Narayanaswamy R. Performance of LibSVM Classifier by Kernel F-score Feature Selection. Springer-Verlag Berlin Heidelberg. 2009; 40:533–43.
  • Sánchez L, Suárez MR, Villar JR, Couso I. Mutual informationbased feature selection and partition design in fuzzy rule-based classifiers from vague data. 2008; 49(3):607–22.
  • Imani MB, Pourhabibi T, Keyvanpour MR, Azmi R. A New Feature Selection Method Based on Ant Colony and Genetic Algorithm on Persian Font Recognition. International Journal of Machine Learning and Computing. 2012; 2(3):1–5. Crossref

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.