Total views : 436

Optimized Feature Selection Algorithm for High Dimensional Data

Affiliations

  • Department Of Computer Science, Mother Teresa Women’s University, Kodaikanal - 624101, Tamil Nadu, India
  • Department of Computer Science, M.V.M. Government Arts College (W), Dindigul - 624001, Tamil Nadu, India

Abstract


Objectives: This research paper, based on fuzzy entropy, adapts a new method along with firefly concept, seeks to select quality features. At the same time it removes redundant and irrelevant attributes in high dimensional data. Methods/Statistical Analysis: Feature selection can be understood as a data prepossessing method in order to reduce dimensionality, eliminate irrelevant data and sharpening of accuracy. In the pattern space, fuzzy entropy is used to estimate the knowledge of pattern distribution. The study of the lightning quality of the fireflies has led to the introduction of the Firefly Algorithm for computing models. This work proposes an algorithm for selecting features by integrating fuzzy entropy and firefly algorithm. Our proposed algorithm's performances are analyzed using four different high dimensional data sets WILT, ORL, LC and LTG. Findings: The algorithm which is introduced here is further experimented with four variant data sets and the results shows that this algorithm out performs the traditional feature selection method. Also our proposed algorithm achieves maximum relevance and minimum level of redundancy. The performance metrics such as sensitivity, specificity and accuracy gives significant improvement when compared with existing FCBF algorithm. Applications/Improvements: Our optimized proposed algorithm efficiently improves the performance by eliminating redundant, noisy and insignificant features and can be applied on all high dimensional data sets.

Keywords

FCBF, Feature Selection Algorithm, Firefly Algorithm, Fuzzy Entropy, High Dimensional Data.

Full Text:

 |  (PDF views: 630)

References


  • Han. J, Kamber M, Pei J. Data mining concepts and techniques. 2nd ed. USA: Morgan Kaufman Publishers; 2004.
  • Kittler J. Feature selection and extraction. Handbook of Pattern Recognition and Image Processing. Y. Fu, editor. New York: Academic Press; 1978.
  • Bradley PS, Mangasarian OL, Street WN. Feature selection via mathematical programming. INFORMS Journal on Computing. 1998; 10(2):209–17.
  • Oh IS, Lee JS, Moon BR. Hybrid Genetic Algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2004; 26(11):1424–37.
  • Kosko B. Fuzzy entropy and conditioning. Information Sciences. 1986; 40(2):165–74.
  • Pal SK, Chakraborty B. Fuzzy set theoretic measure for automatic feature evaluation. IEEE Transactions on Systems, Man and Cybernetics. 1986; 16(5):754–60.
  • Shannon CE. A mathematical theory of communication. Mobile Computing and Communications Review. ACM SIGMOBILE. 2001; 5(1):3–55.
  • Hartley RV. Transmission of information. Bell Syst Tech J. 1928; 7:535–63.
  • Wiener N. Cybernetics. New York: Wiley; 1961.
  • Renyi A. On the measure of entropy and information. Proc Fourth Berkeley Symposium on Mathematical-Statistics Probability; Berkeley, CA. 1961. p. 541–61.
  • Belahut RE. Principles and practice of information theory. Reading, MA: Addison-Wesley; 1987.
  • Cover TM Thomas JA. Elements of information theory. New York: Wiley; 1992.
  • Ching JY, Andrew KC, Chan KCC. Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1995; 17(7):641–51.
  • Blum AL, Langley P. Selection of relevant features and examples in Machine Learning. Artificial Intelligence. 1997; 97(1-2):245–71.
  • Baldi P, Brunak SA. Bioinformatics: The Machine Learning approach. Cambridge, Mass: MIT Press; 2001. p. 476.
  • Sajn L, Kukar M. Image processing and Machine Learning for fully automated probabilistic evaluation of medical images. Computer Methods and Programs Biomedicine. 2010; 104(3):75–86.
  • Dash M, Liu H. Feature selection for classifications: Intelligent data analysis. Intelligent Data Analysis. 1997; 1(3):131–56.
  • Kohavi R, John G. Wrappers for feature subset selection. Artificial Intelligence. 1997; 97(1-2):273–324.
  • Ben-Bassat M. Classification of pattern recognition and reduction of dimensionality. Handbook of Statistics; 1982. p. 773–91.
  • Siedlecki W, Sklansky J. On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence. 1988; 2(2):197–220.
  • John GH, Kohavi R, Pfleger K. Irrelevant feature and the subset selection problem. Proceeding of the 11th International Conference on Machine Learning; San Francisco, CA: Morgan Kaufmann Publishers; 1994. p. 121–9.
  • Kira K, Rendell LA. The feature selection problem: Traditional methods and a new algorithm. Proc 10th Nat’l Conf Artificial Intelligence; 1992. p. 129–34.
  • Koller D, Sahami M. Toward optimal feature selection. Proc Int’l Conf Machine Learning; 1996. p. 284–92.
  • Kononenko I. Estimating attributes: Analysis and extensions of RELIEF. Proc European Conference on Machine Learning; Berlin Heidelberg: Springer; 1994. p. 171–82.
  • Kohavi R, John GH. Wrappers for feature subset selection. Artificial Intelligence. 1997; 97(1-2):273–324.
  • Hall MA. Correlation-based feature selection for discrete and numeric class Machine Learning. Proc 17th Int’l Conf Machine Learning; 2000. p. 359–66.
  • Hall MA. Correlation-based feature subset selection for Machine Learning. New Zealand: University of Waikato; 1999.
  • Yu L, Liu H. Feature selection for high-dimensional data: A fast correlation-based filter solution. Proceedings of the 20th Int’l Conference on Machine Learning. 2003; 20(2):1–8.
  • Fleuret F. Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research. 2004; 5:1531–55.
  • Song Q, Ni J, Wang G. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering. 2013; 25(1):1–14.
  • Yu L, Liu H. Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research. 2004; 5:1205–24.
  • Oh IS, Lee JS, Moon BR. Hybrid Genetic Algorithms for feature selection. IEEE Transactions on Pattern analysis and Machine Intelligence. 2004; 26(11):1424–37.
  • Das S, Ajith A, Konar A. Meta heuristic clustering. Berlin Heidelberg: Springer-Verlag; 2009.
  • Raymer ML, Punch WF, Goodman ED, Kuhn LA, Jain AK. Dimensionality reduction using Genetic Algorithms. IEEE Transactions on Evolutionary Computation. 2000; 4(2):164–71.
  • Yang JH, Honavar V. Feature subset selection using a Genetic Algorithm. Feature Extraction, Construction and Selection. New York: Springer Science; 1998. p. 117–36.
  • Yao YY, Zhao Y. Discernibility matrix simplification for constructing attribute reducts. Information Sciences. 2009; 179(5):867–82.
  • Thangavel K, Pethalakshmi A. Dimensionality reduction based on rough set theory. Applied Soft Computing. 2009; 9(1):1–12.
  • Yu KM, Wu MF, Wong WT. Protocol-based classification for intrusion detection. Applied Computer and Applied Computational Science (ACACOS ‘08). Hangzhou, China. 2008 Mar; 3(3):135–41.
  • Akbar S, Nageswara Rao K, Chandulal JA. Intrusion detection system methodologies based on data analysis. International Journal of Computer Applications. 2010 Aug; 5(2):10–20.
  • Azhagusundari B, Thanamani DAS. Feature selection based on fuzzy entropy. IJETTCS. 2013; 2(2):30–4.
  • Khushaba RN, Al-Jumaily A, Al-Ani A. Novel feature extraction method on fuzzy entropy and wavelet packet transform for myoelectric control. 2007 International Symposium on Communications and Information Technologies, ICSIT 07; Sydney, NSW. 2007. p. 352–7.
  • Sundary BA, Thanamani AS. An efficient feature selection technique using supervised fuzzy information theory. International Journal of Computer Applications. 2014; 85(19):40–5.
  • Shie JD, Chen SM. Feature subset selection based on fuzzy entropy measures for handling classification problems. Applied Intelligence. 2007; 28 (1):69–82.
  • Parvin H, Bidgoli B, Haffarin H. An innovative feature selection using fuzzy entropy. Advances in Neural Networks. 2011; 6677:576–85. ISSN 2011.
  • Sethuramalingam S, Naganathan ER. Hybrid feature selection for network intrusion detection. International Journal of Computer Science and Engineering. 2011; 3(5):1773–80.
  • Gayathri S, Mary Metilda M, Sanjai Babu S. A shared nearest neighbour density based clustering approach on a proclus method to cluster high dimensional data. Indian Journal of Science and Technology. 2015 Sep; 8(22). DOI: 17485/ijst/2015/8i22/79131.
  • Baeck T, Fogel DB, Michalewicz Z. Handbook of evolutionary computation. UK: IOP Publishing Ltd; 1997.
  • Bonabeau E, Dorigo M, Theraulaz G. Swarm Intelligence: From natural to artificial systems. New York: Oxford University Press; 1999.
  • Brown CT, Liebovitch LS, Glendon R. Levy flights in Dobe Ju/’hoansi foraging patterns. Human Ecology. 2007; 35(1):129–38.
  • Kennedy J, Eberhart R. Particle swarm optimization. Proc of the IEEE Int Conf on Neural Networks; Piscataway, NJ. 1995. p. 1942–8.
  • Yang XS. Nature-inspired metaheuristic algorithms. UK: Luniver Press; 2008.
  • Yang XS. Firefly Algorithms for multimodal optimization. Proc 5th Symposium on Stochastic Algorithms, Foundations and Applications. O. Watanabe and T. Zeugmann, editors. 2009. p. 169–78.
  • Yang XS, Deb S. Cuckoo search via Levy flights. Proceedings of World Congress on Nature and Biologically Inspired Computing (NaBIC); Coimbatore. 2009. p. 210–4.
  • Chouchoulas A, Shen Q. Rough set-aided keyword reduction for text categorization. Applied Artificial Intelligence. 2001; 15(9):843–73.
  • Dash M, Liu H. Feature selection for classification. Intelligent Data Analysis. 1997; 1(3):131–56.
  • Thilagar PP, Harikrishnan R. Application of intelligent Firefly Algorithm to solve OPF with STATCOM. Indian Journal of Science and Technology. 2015 Sep; 8(22). DOI: 10.17485/ijst/2015/v8i22/79100.
  • Liu H, Abraham A, Li Y. Nature inspired population based heuristics for rough set reduction. Rough Set Theory. SCI, Springer-Verlag. 2009; 174:261–78.
  • Jensen R, Shen Q. Finding rough set reducts with ant colony optimization. Proceedings UK Workshop on Computational Intelligence; 2003. pp. 15–228.
  • Yue B, Yao W, Abraham A, Liu H. A new rough set reduct algorithm based on particle swarm optimization. 2nd International Conference on the Interplay between Natural and Artificial Computation, IWINAC; 2007. p. 397–406.
  • Suguna N, Thanuskodi K. A novel rough set reduct algorithm for medical domain based on bee colony optimization. Journal of Computing. 2010; 2(6):49–54.
  • Trivedi M, Bezdeck JC. Low-level segmentation of aerial images with fuzzy clustering. IEEE Transactions on Systems, Man and Cybernetics. 1986; SMC-16(4):589–98.
  • Banati H, Bajaj M. Fire Fly based feature selection approach. IJCSI International Journal of Computer Science Issues. 2011; 8(4):473–80.
  • Yang XS. Firefly Algorithm, Levy flights and global optimization. Research and Development in Intelligent Systems. London: Springer; 2010. p. 209–18.
  • Yang XS He X. Firefly Algorithm: Recent advances and applications. International Journal of Swarm Intelligence. 2013; 1(1):36–50.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.