Total views : 519

An Arbitrary Gini Index for the Redundant Feature Datasets Analysis

Affiliations

  • Department of CSE, P V P Siddhartha Institute of Technology, Vijayawada − 520007, Andhra Pradesh, India
  • Department of CSE, ANU College, Guntur − 522510, Andhra Pradesh, India
  • Department of CSE, JKC College, Guntur − 522006, Andhra Pradesh, India
  • Department of CSE, SRK Institute of Technology, Vijayawada − 521108, Andhra Pradesh, India

Abstract


Objectives: Knowledge Discovery methods get more accurate results when the dimensionality of the data is subsided; dimensionality is an important aspect of any data. Several algorithms have been proposed to increase the accuracy, but most of them generate complex models as the size of the data is extremely large. Objective of this paper is to build a simple model to get high accuracy. Method: In order to increase the accuracy of the Knowledge Discovery methods by substituting the dimensionality, we introduce a novel heuristic functionality, Arbitrary Gini Index (ArGI). Findings: We evaluated the performance of ArGI on the real world datasets. The experiment on the ten real world data sets analysis shows 60% data sets are more accurate for ArGI and 40% for Gini Index. Applications: It is expecting that the applications of ArGI will show a better approach in the real world learning tasks.

Keywords

Arbitrary Gini Index, CART, Classification, Datasets, Decision Tree, Filtering, Random Sampling.

Full Text:

 |  (PDF views: 278)

References


  • Song YY, Ying LU. Decision Tree Methods: Applications for Classification and Prediction, Shanghai Archives of Psychiatry. 2015 Apr 25; 27(2):130−35.
  • Li, Zhixu. AutoPCS: A Phrase-based Text Categorization System For Similar Texts, Advances in Data and Web Management, Springer Berlin Heidelberg. 2009 Apr; p. 369−80.
  • Chawla, Nitesh V. C4. 5 and Imbalanced Data Sets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure. Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC; 2003. p. 1-8.
  • Sousa T, Silva A, Neves A. Particle Swarm based Data Mining Algorithms for Classification Tasks, Parallel Computing. 2004 Jun; 30(5):767−83.
  • Dietterich TG. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms, Neural Computation. 1998 Oct; 10(7):1895−23.
  • Lee JS, Grunes MR, Ainsworth TL, Du LJ, Schuler DL, Cloude SR. Unsupervised Classification using Polar Metric Decomposition and the Complex Wishart Classifier, Geosciences and Remote Sensing, IEEE Transactions. 1999 Sep; 37(5):2249−58.
  • Fox J. Applied Regression Analysis, Linear Models. 2nd Edition, Sage Publications. 2008.
  • Asai HT. Linear Regression Analysis with Fuzzy Model, IEEE Transaction Systems Man and Cybermatics.1982 Nov; 12(6):903−07.
  • Gallant. Nonlinear Regression, The American Statistician. 1975 May; 29(2):73−81.
  • Chamberlain G. Multivariate Regression Models for Panel Data, Journal of Econometrics. 1982 Jan; 18(1):5−46.
  • Loh WY. Classification and Regression Trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2011 Jan-Feb;1(1):14−23.
  • Speybroeck N. Classification and Regression Trees, International Journal of Public Health. 2012 Feb; 57(1):243−46.
  • Yin X, Han J. CPAR: Classification based on Predictive Association Rules. In: Shared Decision-Making; 2003 May 3. p. 331−35.
  • Muharram MA, Smith GD. Evolutionary Feature Construction using Information Gain and Gini Index. In: Genetic Programming Springer Berlin Heidelberg; 2004 Apr. p. 379−88.
  • Raileanu LE, Stoffel K. Theoretical Comparison between the Gini Index and Information Gain Criteria, Annals of Mathematics and Artificial Intelligence. 2004 May; 41(1):77−93.
  • Waksberg J. Sampling Methods for Random Digit Dialing, Journal of the American Statistical Association. 1978 Mar; 73(361):40−46.
  • Kalman RE. A New Approach to Linear Filtering and Prediction Problems, Journal of basic Engineering. 1960 Mar; 82:35−45.
  • Kira K, Rendell LA. A Practical Approach to Feature Selection. In: Proceedings of the Ninth International Workshop on Machine Learning; 1992 Jul. p. 249−56.
  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA Data Mining Software: An Update, ACM SIGKDD Explorations Newsletter. 2009 Jun; 11(1):10−18.
  • Lakshmi SV, Prabakaran TE. Performance Analysis of Multiple Classifiers on KDD Cup Dataset using WEKA Tool, Indian Journal of Science and Technology. 2015 Aug; 8(17):1−10.
  • Repository of Machine Learning Databases. Date Accessed: 2016. http://www.ics.uci.edu/mlearn/MLRepository.html.
  • UCI Machine Learning Repository. Date accessed: 1995. http://www.researchpipeline.com/mediawiki/index.php?title=UCI_Machine_Learning_Repository.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.