Total views : 355

Empirical Study of Feature Selection Methods for High Dimensional Data

Affiliations

  • Bharathiar University, Coimbatore - 641046, Tamil Nadu, India
  • PG and Research Department of Computer Science, D. G. Vaishnav College, Chennai - 600106, Tamil Nadu, India

Abstract


Background/Objectives: Feature Selection is a process of selecting features that are relevant which is used in model construction by removing redundant, irrelevant and noisy data. A typical application of Text Mining is classification of messages and e-mails into spam and ham. Methods/Statistical Analysis: This article gives a comprehensive overview of the various Feature Selection methods for Text Mining. Various Filter methods like Pearson Correlation, Chi-square, Symmetrical Uncertainty and Mutual Information are applied to select the optimal set of features. Findings: Filter Feature Selection methods are used to classify Text data. Various Classification algorithms are applied using the optimal set of features obtained. The accuracy of classification algorithms are verified based on the chosen data set. Novelty/ Improvements: A comparative study of various filter methods for Feature Selection and classification algorithms for performance evaluation is conceded in this research work.

Keywords

Chi-Square, Feature Selection, Filter Method, Mutual Information, Pearson Correlation.

Full Text:

 |  (PDF views: 270)

References


  • Karami K, Amir A, Zhou L. Improving static SMS spam detection by using new content-based features. AISeL. 2014. p. 1–9.
  • Uysal AK, Gunal S. A novel probabilistic Feature Selection method for text classification. Knowledge-Based Systems. 2012; 36:226–35.
  • Kim K, Sin-Eon S, Jo J, Choi SH. SMS Spam filtering using Keyword Frequency Ratio. International Journal of Security and its Applications. 2015; 9(1):329–36.
  • Zheng Z, YipingY, Liu F. Filtering Network Spam Message using approximated logistic regression. Journal of Networks. 2014; 9(9):2462–7.
  • Kalaibar K, Sorayya Mirzapour S, Naser Razavi S. Spam filtering by using Genetic based Feature Selection. International Journal of Computer Applications Technology and Research. 2014; 3(12):839–843.
  • Sarkar D, Subhajit S, Goswami S, Agarwal A, Aktar J. A Novel Feature Selection Technique for Text Classification using Naïve Bayes. International Scholarly Research Notices; 2014. p. 10.
  • Ahmed A, Ishtiaq I, Ali R, Guan D, Lee YK, Lee S, Chung TC. Semi-supervised learning using frequent item set and ensemble learning for SMS classification. Expert Systems with Applications. 2015; 42(3):1065–73.
  • Chandra C, Ashish A, Suaib M, Beg B. Web spam classification using supervised Artificial Neural Network algorithms. 2015; 2(1):1–10.
  • Parimala R, Nallaswamy R. A Study of Spam E-mail classification using Feature Selection package. Global Journal of Computer Science and Technology. 2011; 11(7):1–11.
  • Jotheeswaran J, Koteeswaran S. Feature Selection using Random Forest Method for sentiment analysis. Indian Journal of Science and Technology. 2016 Jan; 9(3). DOI: 10.17485/ijst/2016/v9i3/86387.
  • George GVS, Raj VC. Accurate and stable Feature Selection powered by iterative backward selection and cumulative ranking score of features. Indian Journal of Science and Technology. 2015; 8(11). DOI: 10.17485/ijst/2015/v8i11/71766.
  • Sivakumar V, Sivakumar S, Selvaraj R. A novel clustering based Feature Subset Selection Framework for effective data classification. Indian Journal of Science and Technology. 9.4 2016; 9(4). DOI: 10.17485/ijst/2016/v9i4/87038.
  • Bikku T, Sambasiva Rao N, Akepogu AR. Hadoop based Feature Selection and Decision Making Models on Big Data. Indian Journal of Science and Technology. 2016; 9(10). DOI: 10.17485/ijst/2016/v9i10/88905.
  • Van Sang H, Ha Nam N, Duc Nhan N. A novel credit scoring prediction model based on Feature Selection approach and parallel random forest. Indian Journal of Science and Technology. 2016; 9(20). DOI: 10.17485/ijst/2016/v9i20/92299.
  • Liu L, Huan H, Motoda M. Feature Selection for knowledge discovery and Data Mining. Springer Science and Business Media. 2012; 454:214.
  • Kohavi K, Ron R, George H, John J. Wrappers for feature subset selection. Artificial Intelligence.1997; 97(1):273–324.
  • Hall H, Mark A. Correlation-based Feature Selection for machine learning. The University of Waikato; 1999. p. 1–198.
  • Yang Y, Yiming Y, Jan O, Pedersen P. A comparative study on Feature Selection in text categorization. ICML; 1997. p. 1–9.
  • Taneja T, Gaurav G, Ashwini Sethi A. Study of classifiers in Data Mining. International Journal of Computer Science and Mobile Computing. 2014; 3(9):263–9.
  • Sokolova S, Marina M, Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing and Management. 2009; 45(4):427–37.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.